Design of high-speed equalizing drivers for long-wavelength vertical-cavity surface-emitting lasers by Soenen, Wouter


Ontwerp van egaliserende hogesnelheidszenders
voor verticaal-stralende laserdiodes met lange golflengte
Design of High-Speed Equalizing Drivers
for Long-Wavelength Vertical-Cavity Surface-Emitting Lasers
Wouter Soenen
Promotoren: prof. dr. ir. J. Bauwelinck, prof. dr. ir. X. Yin
Proefschrift ingediend tot het behalen van de graad van
Doctor in de ingenieurswetenschappen: elektrotechniek
Vakgroep Informatietechnologie
Voorzitter: prof. dr. ir. B. Dhoedt
Faculteit Ingenieurswetenschappen en Architectuur
Academiejaar 2017 - 2018
ISBN 978-94-6355-096-3
NUR 959
Wettelijk depot: D/2018/10.500/14


Members of the Examination Board
prof. dr. ir. Filip De Turck (Chairman)
INTEC, Ghent University, Belgium
prof. dr. ir. Geert Van Steenberge (Secretary)
ELIS, Ghent University, Belgium
prof. dr. ir. Johan Bauwelinck (Supervisor)
INTEC, Ghent University, Belgium
prof. dr. ir. Xin Yin (Supervisor)
INTEC, Ghent University, Belgium
prof. dr. ir. Geert Morthier
INTEC, Ghent University, Belgium
prof. dr. ir. Filip Tavernier
MICAS, KU Leuven, Belgium
dr. ir. Paraskevas Bakopoulos
Mellanox Technologies, Israel

Dankwoord
Toen ik in 2012 begon aan dit doctoraat kon ik enkel maar hopen op het
parcours dat ik doorheen de jaren heb afgelegd. Toegegeven, het verliep niet
altijd zonder horten en stoten, maar uiteindelijk mag ik trots zijn op het werk
dat ik heb afgeleverd. Ik heb alvast geen spijt van mijn keuze destijds om
een doctoraatsopleiding te starten omdat het toch wel over enkele unieke
aspecten bezit die je moeilijk in een andere job zal vinden. De mogelijkheid
om rechtstreeks te concurreren met toonaangevende academische instituten
en meester zijn van je eigen planning en dag- of nachtindeling zijn daar
slechts enkele van. Iets wat mij misschien nog het meest zal bijblijven, is de
spanning die ervaren wordt tijdens het indienen van een conferentiepaper
enkele minuten voor de deadline, om dan een paar maanden later het ver-
lossende antwoord te horen van het beoordelingscomité. Het belangrijkste
echter is dat het mijn zelfontwikkeling sterk bevorderd heeft; niet enkel als
ingenieur en als wetenschapper, maar ook als persoon. Je weerbaarheid en
doorzettingsvermogen worden namelijk meerdere malen op de proef gesteld
en het zijn net die momenten die een goede afloop bewerkstelligen. Zoals de
wijze meester Yoda ooit zei: “Do. Or do not. There is no try.”. Kortom,
indien je er niet de volle 100% voor gaat, hoef je er zelfs niet eens aan te
beginnen.
Dit doctoraat zou natuurlijk niet mogelijk geweest zijn zonder de steun van
mijn promotoren prof. Johan Bauwelinck en prof. Xin Yin. Dankzij hen heb
ik de kans gekregen om, in het kader van twee Europese projecten, te werken
op een actueel onderwerp omtrent het aansturen van verticaal-stralende laser-
diodes met lange golflengte. Hierdoor kon ik hoogstaande internationale
conferenties bijwonen, uiteraard met daaraan gekoppeld wat ‘obligatoir’ toe-
risme. De financiële ondersteuning voor de fabricage van de vier ontworpen
microchips alsook voor de hoogwaardige en extreem dure meetapparatuur
vereist voor het effectief karakteriseren van deze microchips—vandaar dat ik
al na enkele maanden in dienst bij INTEC Design spontaan begon te rekenen
in ke—werd in orde gebracht door mijn promotoren, maar mede ook door de
xfaculteit van de ingenieurswetenschappen alsook door de vakgroep INTEC
en door de overkoepelende onderzoeksgroepen IBCN en imec. Tevens mag
de bijdrage van prof. em. Jan Vandewege niet vergeten worden, aangezien
hij aan de wieg stond van INTEC Design.
Ik magmijzelf ook gelukkig prijzen dat ik aansluiting vond in een aangename,
flexibele en leerrijke werkomgeving. Ik heb veel opgestoken van collega’s
zowel tijdens het werk als ernaast. En ‘ernaast’ kan zelfs behoorlijk letterlijk
opgenomen worden ten tijde van het Technicumtijdperk, want de talloze
caféavonden in de Marimain en de daaruit voortvloeiende etentjes met dr.
Christophe Van Praet, dr. Jochen Verbrugghe, prof. Guy Torfs, ir. Bart
Moeneclaey, ir. Koen Verheyen en ir. Haolin Li waren altijd uiterst amusant,
doch soms uiterst lastig de dag nadien. Een verse lading aan ambitieuze en
toffe doctoraatsstudenten, gevolgd door de verhuis naar het Technologiepark
in Zwijnaarde zorgde voor een nieuwe wind in het labo. Hun frisse kijk op
de zaken leidde soms tot ietwat luidruchtige discussies, die af en toe zelfs
ontaardden in een regelrechteMexican standoff. Ongeacht welke werklocatie,
er was altijd doch niet elke werkdag één constante aanwezig, namelijk Bart.
Het aaneengrenzen van onze bureaus en de samenwerking voor de Europese
projecten was de aanzet tot de vele avontuurlijke en aangename zakenreisjes,
gaande van Duitsland tot Spanje tot Amerika, en ik ben uiterst dankbaar
dat hij mijn reisgezel was. Verder wens ik ing. Jan Gillis te bedanken voor
het regelmatig reanimeren van mijn Linux computer, ir. Joris Lambrecht
voor het ontwerp van enkele extreem herconfigureerbare PCBs en bovenal dr.
Renato Vaernewyck die mij begeleidde en assisteerde bij het chipontwerp.
Tot slot wil ik mijn ouders bedanken voor de morele en financiële onder-
steuning van mijn studies, alsook mijn zus die altijd klaarstond in tijden van
nood. De laatse dankbetuiging gaat uit naar mijn zielsgenoot Katrien. Zij
moet al 5,5 jaar mijn onregelmatig en van stressbuien doordrongen werk- en
bioritme verdragen. Gelukkig bracht zij mij nog wat structuur bij en was ze
altijd begripvol.
Maar laat ik je alvast niet verder in spanning afwachten om de rest van dit
boek te verslinden. . .
Gent, januari 2018
Wouter Soenen
Table of Contents
Glossary xv
Nederlandstalige Samenvatting xvii
English Summary xxi
List of Publications xxv
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Data center intraconnects targeting 400Gb Ethernet . . . . 4
1.3 Recent advances in VCSEL-based transmitters . . . . . . . 6
1.3.1 Short- and long-wavelength VCSELs . . . . . . . 6
1.3.2 VCSEL drivers . . . . . . . . . . . . . . . . . . . 8
1.4 Outline of the dissertation . . . . . . . . . . . . . . . . . . 10
2 Modeling long-wavelength VCSELs 13
2.1 Structure of long-wavelength VCSELs . . . . . . . . . . . 13
2.2 Mapping an electrical network to the physical structure . . 15
2.3 Extraction of component values . . . . . . . . . . . . . . 18
2.3.1 Electrical access network . . . . . . . . . . . . . . 18
2.3.2 Optical resonator . . . . . . . . . . . . . . . . . . 20
2.4 Large signal model . . . . . . . . . . . . . . . . . . . . . 22
2.4.1 Shockley diode . . . . . . . . . . . . . . . . . . . 22
2.4.2 Non-linear optical resonator . . . . . . . . . . . . 24
2.4.3 Rate equations . . . . . . . . . . . . . . . . . . . 27
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Co-optimization between the driver-VCSEL interface and on-
chip feed-forward equalization 35
3.1 Modeling the driver-VCSEL interface . . . . . . . . . . . 36
xii Table of Contents
3.2 Equalization of the channel in function of the driver impedance 40
3.3 Verification of the equalized driver-VCSEL interface . . . 46
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 49
4 PAM-4 VCSEL driver with a 4-tap FFE output stage 51
4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.1.1 Specifications . . . . . . . . . . . . . . . . . . . . 52
4.1.2 Channel architecture . . . . . . . . . . . . . . . . 54
4.2 Retiming . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Introducing the synchronization window . . . . . . 56
4.2.2 Troubleshooting the first generation retiming block 57
4.2.3 Deriving flip-flop specifications . . . . . . . . . . 60
4.2.4 Cascading multiple latches . . . . . . . . . . . . . 62
4.2.5 Avoiding the setup and hold region with delay stages 64
4.2.6 Cleaning up the delayed signal with a CTLE stage 68
4.2.7 Buffering the clock signal . . . . . . . . . . . . . 70
4.3 4-tap PAM-4 driver architecture . . . . . . . . . . . . . . 71
4.3.1 Output driver . . . . . . . . . . . . . . . . . . . . 71
4.3.2 One mirror to rule them all . . . . . . . . . . . . . 76
4.3.3 High-frequency modeling of current-routing bus . 82
4.4 Asymmetric pulse shaping for multi-level signals . . . . . 85
4.4.1 Applying pulse width distortion . . . . . . . . . . 85
4.4.2 Selective falling-edge pre-emphasis . . . . . . . . 87
4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.5.1 A 2-tap FFE transmitter IC modulating a 1.5 µm
VCSEL with NRZ and PAM-4 . . . . . . . . . . . 92
4.5.2 Electrical characterization of the 4-tap FFE PAM-4
driver . . . . . . . . . . . . . . . . . . . . . . . . 99
4.5.3 Evaluation of the selective falling-edge
pre-emphasis circuit . . . . . . . . . . . . . . . . 102
4.5.4 Effect of a 4-tap FFE on a PAM-4 modulated 1.5 µm
VCSEL at 25GBd and 28GBd . . . . . . . . . . . 105
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 109
5 Pseudo-balanced regulated
VCSEL driver 111
5.1 DC-coupled vs. AC-coupled laser driver . . . . . . . . . . 111
5.2 DC-coupled laser driver with output common-mode voltage
control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.3 Pseudo-balanced regulated laser driver . . . . . . . . . . . 117
Table of Contents xiii
5.3.1 Concept . . . . . . . . . . . . . . . . . . . . . . . 117
5.3.2 2-tap equalizer output stage . . . . . . . . . . . . 118
5.3.3 Common-mode output voltage regulated predriver 121
5.4 Channel architecture . . . . . . . . . . . . . . . . . . . . 122
5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.5.1 Electrical characterization . . . . . . . . . . . . . 125
5.5.2 A 40Gb/s 1.5 µm VCSEL link operating at a single
supply voltage of 2.5V . . . . . . . . . . . . . . . 126
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 131
6 Conclusions and some prospects 133
A Tunnel diode operation 137
A.1 Simplified band theory of solids . . . . . . . . . . . . . . 137
A.2 Tunnel diode . . . . . . . . . . . . . . . . . . . . . . . . 138
B VCSEL models in Verilog-A 141
B.1 Non-linear lumped element electrical access network and
optical resonator . . . . . . . . . . . . . . . . . . . . . . . 141
B.2 Transformed rate equations . . . . . . . . . . . . . . . . . 143
Bibliography 147

Glossary
fT transition frequency.
m drive efficiency.
BER bit-error ratio.
BTB back-to-back.
BTJ buried tunnel junction.
CDR clock and data recovery.
CID consecutive identical digits.
CTLE continuous-time linear equalizer.
DCI data center interconnection.
DFB distributed-feedback.
ER extinction ratio.
FEC forward error correction.
FFE feed-forward equalization.
GbE Gigabit Ethernet.
IC integrated circuit.
ISI intersymbol interference.
MAN metropolitan area network.
MMF multi-mode fiber.
MPO multi-path push-on.
xvi Glossary
NRZ non-return-to-zero.
OMA optical modulation amplitude.
PAM-4 four-level pulse amplitude modulation.
PD photodiode.
PRBS pseudorandom bit sequence.
PWD pulse width distortion.
RS Reed-Solomon.
SC short-cavity.
SDM spatial division multiplexing.
SMF single-mode fiber.
SWDM short-wavelength division multiplexing.
TIA transimpedance amplifier.
TUM Technische Universität München.
VCSEL vertical-cavity surface-emitting laser.
WAN wide area network.
WDM wavelength division multiplexing.
Nederlandstalige Samenvatting
De manier waarmee iedereen omgaat met het internet is drastisch veranderd
de laatste vijf à tien jaar. Directe toegang tot onze online sociale netwerken
en favoriete webpagina’s wordt tegenwoordig als normaal aanzien. De
lancering van Facebook en YouTube in 2004 en 2005 liet mensen toe om
hun persoonlijk leven te delen via video’s en foto’s, wat de populariteit van
smartphones zeker bespoedigde. Anno 2017 telt Facebook twee miljard
actieve maandelijkse gebruikers en wordt er gemiddeld elke minuut 300 uur
aan multimedia op YouTube geplaatst. Globale internetdiensten—ook wel
de ‘cloud’ genoemd—worden geleverd door telecombedrijven die hiervoor
beschikken over wereldwijd verspreide datacenters. Deze centers com-
municeren met elkaar in een zogenaamd breed-verspreid netwerk (WAN),
dat meerdere continenten overbrugt door middel van onderzeese glasvezel-
verbindingen. Per continent of land worden datacenters verder ingedeeld
in metropool netwerken (MAN), met als voornaamste doel om synchrone
kopieën te creëren om de fouttolerantie te bevorderen of om een groter
virtueel datacenter voor te stellen. Typisch wordt de afstand van een MAN
datacenterverbinding beperkt tot minder dan 100 km om de tijdsvertraging
van de link binnen de perken te houden.
De communicatie binnenin een datacenterfaciliteit vindt plaats tussen ser-
verrekken en netwerkschakelaars, waarvan elk rek tot wel veertig servers
of meer kan huizen. De interface met het serverrek gebeurt via optische
glasvezelkabel, waarbij het interieur van het rek zowel met koper- als met
glasvezelverbindingen kan bedraad worden. In feite zijn de koperen verbin-
dingen beperkt in afstand tot 10m, omwille van de bandbreedtebeperking
inherent aan het medium, terwijl afstanden tot 100m kunnen overbrugd
worden met multi-mode glasvezel (MMF). Daarentegen wordt mono-mode
glasvezel (SMF) dan weer ingezet in MAN en WAN netwerken, omwille
van het vereiste kilometerbereik. Er mag verder niet uit het oog verloren
worden dat koperen - en SMF verbindingen respectievelijk de goedkoopste
en de duurste oplossing vormen, met daartussenin MMF als de gulden
xviii Nederlandstalige Samenvatting
middenweg. Als kostprijs en transmissie-afstand even buiten beschouwing
gelaten worden, genieten glasvezelkabels nog altijd de voorkeur op koperen
kabels. Dit is voornamelijk omdat ze lichter en flexibeler zijn–hetgeen
de luchtventilatie ten goede komt—en daardoor leiden tot een schaalbare
en toekomstbestendige inzetbaarheid van serverrekken voor datacenters.
Elke optische glasvezelverbinding bestaat uit een zender die een lichtbron
moduleert zodoende elektrische informatie om te zetten in lichtinformatie.
Dit licht wordt, na het afleggen van een zekere afstand via MMF of SMF,
geconverteerd naar een elektrische spanning aan de ontvangerkant door de
aaneenschakeling van een fotodetector en een transimpedantieversterker.
Aangezien deze datacenterinfrastructuren tezamen ongeveer 3% verbruiken
van de wereldwijde elektriciteitsproductie, is het evident dat de ecologi-
sche impact hiervan niet mag onderschat worden. Daarom is er een sterke
nood aan onderzoek naar schaalbare en energie-efficiënte ofte ‘groene’
intra-datacenterverbindingen, om te kunnen voldoen aan de behoefte om
wereldwijd meer en meer informatie te kunnen verwerken. Hiermee wordt
vanzelfsprekend het voornaamste doel van deze thesis geïntroduceerd; name-
lijk het ontwikkelen en evalueren van snellere, energie-efficiëntere zenders,
toepasbaar in de nieuwe generatie datacommunicatiemodules. Datacenter-
verkeer maakt voornamelijk gebruik van het pakketgeschakelde Ethernetpro-
tocol. Datasnelheden tussen de server en de netwerkschakelaar bedragen
typisch 10Gb/s en tussen netwerkschakelaars onderling typisch 40Gb/s.
Momenteel wordt er geleidelijk aan overgeschakeld naar 25Gb/s/100Gb/s
technologie, die op lange termijn dan weer vervangen zal worden door
100Gb/s/400Gb/s technologie. Optische MMF links gebruikmakend van
directe, binair gemoduleerde verticaal-stralende diodelasers (VCSELs) die
licht genereren op een korte golflengte van 850 nm, heersen momenteel in
de datacentermarkt van vandaag, met commerciële modules verkrijgbaar
aan 25Gb/s per laan. De transmissieafstand over OM4 MMF is echter wel
beperkt tot 100m omwille van de negatieve invloed van modale dispersie
op het ontvangen signaal. Langere afstanden, tot wel 2 km zoals vereist
door de 400GBASE-FR8 Ethernet standaard, kunnen overbrugd worden
via SMF, hetgeen lange-golflengte zenders (1.3-1.5 µm) vereist. Voor deze
applicaties zijn vandaag de dag direct gemoduleerde horizontaal-stralende
lasers de norm, zoals de betrouwbare, maar energiehongerige laserdiode met
gedistribueerde terugkoppeling. In een poging om het stroomverbruik sterk
te reduceren, loont het dus de moeite om het potentieel van lange-golflengte
VCSELs voor intra-datacenterverbindingen tot 2 km te onderzoeken. Hier-
Dutch Summary xix
voor dienen er applicatie-specifieke geïntegreerde zendercircuits ontworpen
te worden, om datasnelheden tot 40Gb/s en meer te ondersteunen, en dit in
het kader van de 400Gigabit Ethernet standaarden.
Naast de huidige situatie en opkomende trends in datacenternetwerken, wordt
er in Hoofdstuk 1 ook de evolutie van de modulatiebandbreedte van korte-
en lange-golflengte VCSELs besproken over een tijdspanne van tien jaar.
De relevantie van het geleverde werk tijdens dit doctoraat wordt gekaderd
in een overzicht van baanbrekend presterende VCSEL zenders. Doorheen
deze scriptie komen er verscheidene ontwikkelingsaspecten aan bod van de
voorgestelde VCSEL zenders, die evenwel kunnen geïntegreerd worden in een
algemener ontwerpproces voormicrochip laser zenders. Eerst en vooral wordt
er een equivalent simulatiemodel opgesteld van een 1.5 µm InP-gebaseerde
VCSEL met verzonken tunneljunctie, waarvan de modelparameters gefit
worden aan kleinsignaalmodulatiemetingen en aan curves die het optisch
vermogen en de spanning weergeven in functie van de instelstroom doorheen
de laserdiode. Uiteindelijk komt een uitgebreid model tot stand gebaseerd op
gekoppelde intrinsieke laserdiodevergelijkingen, die in staat is om accuraat
het niet-linear modulatiegedrag van de VCSEL na te bootsen. Het volgende
hoofdstuk behandelt de elektrische interface tussen de uitgangstrap van de
zender en deVCSEL,waarbij er gestreefdwordt naar het optimaliseren van het
elektro-optisch modulatie-antwoord. De volledige ontwerpruimte—bepaald
door onder andere de uitgangsimpedantie van de zender, de inductantie van
de interconnectie, de egalisatieparameters alsook de instelstroom van de
laser—wordt verkend zodoende een optimale afweging te vinden tussen
performantie en vermogenverbruik. Hoofdstukken 4 en 5 gaan dieper in op
het ontwerp van geïntegreerde circuits om de VCSELs aan te sturen: namelijk
een zender die gebruikt maakt van pulsamplitude modulatie verspreid over
vier niveaus (PAM-4) en vervolgens een zender die zich beperkt tot aan-uit
modulatie van de laser. Deze microchips werden ontwikkeld in het kader
van de Europese FP7 onderzoeksprojecten MIRAGE en PHOXTROT. Een
methodologie voor het ontwerp van synchroniserende flip-flops en een
herbruikbare generieke stroomspiegel worden besproken en dit in functie
van de PAM-4 uitgangstrap die is uitgerust met een viertaps egalisator.
Daarna komen enkele technieken aan bod die trachten het niet-lineaire
modulatiegedrag van de VCSEL te compenseren. Een gemeenschappelijke
nadeel aan zenders die de kathode van een laser aansturen, is de aanwezigheid
van meerdere voedingsspanningen. Een concept en architectuur wordt
voorgesteld in Hoofdstuk 5, die het mogelijk maakt om een VCSEL zender
xx Nederlandstalige Samenvatting
te laten functioneren met een enkele voedingsspanning zonder daarbij
aan hoge-snelheids performantie in te boeten. Dit concept werd tevens
ingediend als een patentaanvraag. Vervolgens eindigen beide hoofdstukken
met een samenvatting van de uitgevoerde experimenten, die gepresenteerd
zijn op conferenties en gepubliceerd zijn in wetenschappelijke tijdschriften.
Uiteindelijk besluit Hoofdstuk 6 met een kritisch overzicht van het geleverde
werk, met aansluitend een persoonlijke visie over de volgende generatie
intra-datacenterverbindingen.
English Summary
The way people are using the Internet has drastically changed the last five
to ten years. Immediate access to our online social networks or favorite
web pages, has been taken for granted these days. The launch of Facebook
and YouTube back in 2004 and 2005, enabled people sharing their personal
life through videos and pictures, thereby accelerating the popularity of
smartphones. Anno 2017, Facebook has reached two billion monthly active
users while 300 hours of content is uploaded on YouTube every minute.
Global internet services—also referred to as ‘the cloud’—are provided by
internet service providers that deploy data centers spread all over the world.
They communicate with each other in a so called wide area network (WAN),
spanning multiple continents through submarine networks. Locally, few
data centers are placed in a metropolitan area network (MAN) to enable
synchronous replication that increases the fault tolerance or to create a larger
virtual data center. Typically, the distance between a MAN data center
interconnection (DCI) is restricted to less than 100 km because of latency
requirements.
The communication inside the data center facility occurs between server
racks and network switches, where each rack can contain up to forty servers
or even more. The interface to the rack occurs through optical fiber while
the interior wiring can be a mixture of copper and fiber. Basically, copper
interconnects are limited to less than 10m due to the limited bandwidth of
the medium while data center intraconnects up to 100m utilize multi-mode
fiber (MMF). On the other hand, single-mode fiber (SMF) is implemented
in MAN and WAN DCI because of the required km-range. Important to
note is that the installation cost increases accordingly when transitioning
from a copper to a MMF to a SMF solution. Cost and link lengths con-
siderations aside, optical fiber is preferred over copper because it is lighter
and less bulky—hence improving the airflow—and supports a scalable and
future-proof deployment of server racks inside the data center. Each optical
fiber link consists of a driver/transmitter modulating a light source to convert
xxii English summary
the electrical data into light. After propagating through a SMF or MMF
interconnect, light is converted back to an electrical voltage at the receiver
side by a photo detector cascaded with a transimpedance amplifier.
Since all these data center facilities together consume up to 3% of the
global electricity production, the ecological impact can not be neglected.
Hence, there is a strong need to investigate scalable and energy efficient or
‘green’ alternatives for interconnections inside data centers, to cope with
the information capacity requirements of the future. This brings us directly
to the primary purpose of this thesis, i.e. to explore and evaluate faster
energy efficient transmitters that can be deployed in next-generation data
communication modules. Data center traffic is mostly based on the Ethernet
packet switching protocol and is currently transitioning from 10Gb/s/40Gb/s
to 25Gb/s/100Gb/s and 100Gb/s/400Gb/s (server-to-switch speed/switch-
to-switch speed). Directly binary modulated short-wavelength (850 nm)
vertical-cavity surface-emitting laser (VCSEL)-based optical links transmit-
ting over MMF are prevailing the data center market today, operating at
25Gb/s per lane. However, modal dispersion limits the transmission distance
to 100m over OM4 MMF. Longer link lengths, up to 2 km as requested by
the 400GBASE-FR8 Ethernet standard, are covered over SMF but require
long-wavelength transmitters (1.3-1.5 µm). For these 2 km applications, the
commodity directly modulated lasers of today are power-hungry, but reliable
edge-emitting lasers, like the distributed-feedback laser diode. In order to
significantly improve the energy efficiency, it is worthwhile investigating the
potential of long-wavelength VCSELs for long-reach data center intracon-
nects. As such, a dedicated driver integrated circuit needs to be developed
to accommodate data rates of 40Gb/s and beyond, to be compliant with
400Gigabit Ethernet standards.
Aside from the current situation and upcoming trends in data center networks,
Chapter 1 further describes the evolution of the modulation bandwidth of
both short- and long-wavelength VCSELs the past ten years. The relevance of
this work is situated here by comparing it with the state-of-the-art in VCSEL
transmitters. Throughout this dissertation, several aspects concerning the
development of the respective high-speed VCSEL drivers are covered, which
can also be integrated in a more general design flow for laser drivers. First
of all, an equivalent simulation model will be created of the load, being a
1.5 µm InP-based VCSEL with buried tunnel junction, of which the model
parameters are fitted to small-signal modulation measurements and curves
English summary xxiii
that express the static optical power and voltage in function of the current
flowing through the VCSEL. In the end, a comprehensive model is created
based on coupled laser diode rate equations that accurately captures the
non-linear modulation behavior of the VCSEL. The next chapter deals with
the electrical interface between the drivers’ output stage and the VCSEL in
order to optimize the complete electro-optic modulation response. The entire
design space—determined by the drivers’ output impedance, interconnect
inductance, on-chip equalization and bias current of the VCSEL—is explored
to find the optimal trade-off between performance and power dissipation.
Chapters 4 and 5 elaborate on the design of integrated circuits to drive the
VCSEL: either with a four-level pulse amplitude modulating (PAM-4) driver
or with an on-off binary modulating driver. These microchips were developed
as part of the European FP7 research projects MIRAGE and PHOXTROT. A
design methodology for retiming flip-flops and a generic reusable current
mirror are discussed in light of the symbol-spaced 4-tap equalizing PAM-4
output stage. In addition, multiple techniques are described to compensate
the non-linear modulation behavior of the VCSEL. A common property
of cathode-drive laser drivers is the presence of multiple supply voltages.
A concept and architecture is proposed in Chapter 5, that enables the
VCSEL driver to run of a single supply voltage while preserving high-speed
performance. This concept was also filed as a patent application. Both
chapters end with a summary of the conducted experiments published in
conference proceedings and journal papers. Finally, Chapter 6 concludes
with a critical overview of the presented work and some prospects of
next-generation data center intraconnects.

List of Publications
Publications in international journals
• W.Soenen, R.Vaernewyck, X.Yin, S. Spiga,M-.C.Amann, K. S.Kaur,
P. Bakopoulos and J. Bauwelinck, “40 Gb/s PAM-4 Transmitter IC
for Long-Wavelength VCSEL Links,” in IEEE Photonics Technology
Letters, vol. 27, no. 4, pp. 344-347, Feb. 2015.
• J. Verbist, J. Zhang, B. Moeneclaey,W. Soenen, J. V. Weerdenbrug,
R. V. Uden, C. Okonkwo, G. Roelkens and J. Bauwelinck, “A 40-
GBd QPSK/16-QAM Integrated Silicon Coherent Receiver,” in IEEE
Photonics Technology Letters, vol. 28, no. 19, pp. 2070-2073, Oct.
2016.
• S. Spiga,W. Soenen, A. Andrejew, D. Schoke, X. Yin, J. Bauwelinck,
G. Boehm and M.-C. Amann, “Single-Mode High-Speed 1.5-µm
VCSELs,” in Journal of Lightwave Technology, vol. 35, no. 4, pp.
727-733, Feb. 2017.
• W. Soenen, J. Lambrecht, X. Yin, S. Spiga, M.-C Amann, G. V. Steen-
berge, P. Bakopoulos and J. Bauwelinck, “PAM-4 VCSEL driver with
selective falling-edge pre-emphasis,” in Electronics Letters, vol. 54,
no. 3, pp. 155-157, Feb. 2018.
• A. Malacarne, C. Neumeyr, W. Soenen, F. Falconi, C. Porzi, T. Aalto,
J. Rosskopf, J. Bauwelinck and A. Bogoni, “Optical Transmitter based
on 1.3-µm VCSEL and SiGe Driver Circuit for Short Reach and
Beyond,” in Journal of Lightwave Technology, vol. 99, no. PP, pp.
1-10, Dec. 2017.
• H. Ramon, M. Vanhoecke, J. Verbist, W. Soenen, P. D. Heyn, Y. Ban,
M. Pantouvaki, J. V. Campenhout, P. Ossieur, X.Yin and J. Bauwelinck,
“Low-Power 56 Gb/s NRZ Microring Modulator Driver in 28 nm
xxvi List of Publications
FDSOI CMOS,” in IEEE Photonics Technology Letters, vol. PP, no.
99, pp. 1-1, Jan. 2018
Publications in international conferences
• W. Soenen, R. Vaernewyck, A. Vyncke, J. Verbrugghe, J. Gillis and
J. Bauwelinck, “Evaluation of a discrete 4-PAM optical link for future
automotive networks,” in Proceedings of the 2012 Annual Symposium
of the IEEE Photonics Society Benelux Chapter, Mons, Belgium, Nov.
2012, pp. 69-72.
• J. Bauwelinck, R. Vaernewyck, J. Verbrugghe, W. Soenen and et
al., “High-Speed Electronics for Short-Link Communication,” in 39th
European Conference and Exhibition on Optical Communication,
Proceedings, London, UK: VDE, Sept. 2013, Mo.4.F.4, pp.164-166.
• R. Vaernewyck, J. Verbrugghe, W. Soenen, B. Moeneclaey, G. Torfs,
X. Yin and J. Bauwelinck, “High-speed electronic integrated circuits
for metro, access and data center networks,” in Workshop Optical
Interconnect in Data Centers (EPIC 2014), Berlin, Germany, Mar.
2014.
• B. Moeneclaey, G. Kanakis, J. Verbrugghe, N. Iliadis,W. Soenen and
et al., “A 64 Gb/s PAM-4 linear optical receiver,” in 2015 Optical Fiber
Communications Conference and Exhibition (OFC), Los Angeles, CA,
USA: OSA, Apr. 2015, M3C.5, pp.1-3.
• J. Bauwelinck, W. Soenen and et al., “On-Chip transmitter and
receiver front-ends for ultra-broadband wired and optical-fiber com-
munications,” in 2016 Optical Fiber Communications Conference and
Exhibition (OFC), Anaheim, CA, USA: OSA, 2016, Tu3B.1, pp. 1-3.
• J. Verbist, J. Zhang, B. Moeneclaey,W. Soenen, J. V. Weerdenbrug,
R. V. Uden, C. Okonkwo, G. Roelkens and J. Bauwelinck, “A 40Gbaud
integrated silicon coherent receiver," in 18th European Conference in
Integrated Optics 2016, Warsaw, Poland, May 2016, pp. 1-2.
• W. Soenen, R. Vaernewyck, S. Spiga, M-.C. Amann, P. Bakopoulos,
E. Mentovich, G. V. Steenberge, X. Yin and J. Bauwelinck, “A 2x50
Gb/s PAM-4 driver IC integrated with a 1.5 µm VCSEL array,” in
Proceedings of the 21st Annual Symposium of the IEEE Photonics
Benelux Chapter, Ghent, Belgium, Nov. 2016, pp. 87-90.
List of Publications xxvii
• W.Soenen, R.Vaernewyck, X.Yin, S. Spiga,M.-C.Amann, G.V. Steen-
berge, E. Mentovich, P. Bakopoulos and J. Bauwelinck, “56 Gb/s
PAM-4 driver IC for long-wavelength VCSEL transmitters,” in (2016)
42nd European Conference on Optical Communications (ECOC),
Dusseldorf, Germany: VDE, Sept. 2016, Th.1.C.4, pp. 980-982.
• W. Soenen, B. Moeneclaey, X. Yin, S. Spiga, M.-C. Amann, C.
Neumeyr, M. Ortsiefer, E. Mentovich, D. Apostolopoulos, P. Bakopou-
los and J. Bauwelinck, “A 40-Gb/s 1.5-µm VCSEL link with a low-
power SiGe VCSEL driver and TIA operated at 2.5V,” in 2017
Optical Fiber Communications Conference and Exhibition (OFC),
Los Angeles, CA, USA: OSA, Mar. 2017, pp. 1-3.
• W. Soenen, B. Moeneclaey, X. Yin and J. Bauwelinck, “ADE as-
sembler flow for rapid design of high-speed low-power circuits,” in
Proceedings of CDNLive, Cadence User Conference 2017 EMEA,
Munich, Germany, May 2017.
• A. Malacarne, F. Falconi, C. Neumeyr,W. Soenen, C. Porzi, T. Aalto,
J. Rosskopf, M. Chiesa, J. Bauwelinck and A. Bogoni, “Low-Power
1.3-µmVCSELTransmitter forData Center Interconnects andBeyond,”
in 43rd European Conference on Optical Communication (ECOC),
Gothenburg, Sweden: VDE, Sept. 2017, pp. 1-3.
• A. Elmogi, W. Soenen, E. Bosman, J. Missinne, S. Spiga, M.-
C. Amann, J. Bauwelinck and G. V. Steenberge, “Flexible hybrid
integration of photonic and electronic chips using aerosol-jet printing,”
in Proceedings of the 22nd Annual Symposium of the IEEE Photon-
ics Society Benelux Chapter, Delft, The Netherlands, Nov. 2017,
pp. 82-85.
Patents
• W. Soenen and X. Yin, “Pseudo-balanced driver,” European Patent
Application EP16177056, Jun. 30, 2016.

1
Introduction
This first chapter will provide the motivation of the research conducted
the past five years. At first, the historical evolution of internet traffic and
consumption will be presented. To sustain the growth of internet traffic,
several precautions need to be taken. One of these precautions is increasing
the data rate inside the data centers, which form the hotspots of the global
internet network. The primary purpose of this thesis is to explore and
evaluate faster energy efficient alternatives to the currently deployed data
communication interconnects inside these buildings. A brief overview
of the optical transmitters that were developed during this work, will be
presented alongside the state of the art in vertical-cavity surface-emitting
lasers (VCSELs) and driver circuits to situate the relevance of the research.
The chapter will be concluded by describing the outline of the dissertation.
1.1 Background
The way people are using the Internet has drastically changed the last five to
ten years. Immediate access to our online social networks or favorite web
pages, has been taken for granted these days. The launch of Facebook and
YouTube back in 2004 and 2005, enabled people sharing their personal life
through videos and pictures, thereby accelerating the popularity of smart-
phones. Anno 2017, Facebook has reached 2 billion monthly active users
while 300 hours of content is uploaded on YouTube every minute [1,2]. The
2 Chapter 1
Table 1.1: The Cisco Visual Networking Index forecast: historical internet context [4]
Year Global Internet Traffic Relative growth
1992 100GB per day
1997 100GB per hour ×24
2002 100GB per second ×3600
2007 2000GB per second ×20
2016 26 600GB per second ×13.3
2021 105 800GB per second ×4
fact that you can have access to a massive amount of information any time you
want, has forced the entertainment industry to change their model of media
distribution to a content-on-demand approach. Music streaming services
such as Spotify have taken over the revenue share coming from physical
music media, while high-definition video streaming through e.g. Netflix or
Amazon is gaining popularity over costly pay-tv subscriptions [3]. Other
applications responsible for the growth of internet traffic are eSports, cloud
storage and computing, financial transactions, . . . all of course accessible
through cellular and wired networks.
Although the growth of global internet traffic is slowing down, according
to the forecast from Cisco summarized in Table 1.1, it is nevertheless still
rapidly increasing. Global internet services—also referred to as ‘the cloud’—
are provided by internet service providers that deploy data centers spread all
over the world. These data center infrastructures can reach warehouse-scale
dimensions, of which an example is shown in Fig. 1.1, and can cover up to
0.6 km2 [7]. They communicate with each other in a so called wide area
network (WAN), spanning multiple continents through submarine networks.
Locally, few data centers are placed in a metropolitan area network (MAN)
to enable synchronous replication that increases the fault tolerance or to
create a larger virtual data center, see Fig. 1.2 for an overview. Typically, the
Figure 1.1: Exterior and interior of a data center [5, 6].
Introduction 3
Figure 1.2: Top-down view of the components that enable the internet services, descending
from the data center buildings into the server racks into the optical fiber link.
distance between a MAN data center interconnection (DCI) is restricted to
less than 100 km because of latency requirements [8].
The communication inside the data center facility occurs between server
racks and network switches, where each rack can contain up to forty servers
or even more. Nowadays, the servers in a rack are connected to a network
switch in a top-of-rack configuration. The interface to the rack occurs
through optical fiber while the interior wiring can be a mixture of copper
and fiber. Basically, copper interconnects are limited to less than 10m due
to the limited bandwidth of the medium while data center intraconnects up
to 100m utilize multi-mode fiber (MMF). On the other hand, single-mode
fiber (SMF) is implemented in MAN and WAN DCI because of the required
4 Chapter 1
km-range. Important to note is that the installation cost increases accordingly
when transitioning from copper to MMF to SMF. Cost and link lengths
considerations aside, optical fiber is preferred over copper because it is lighter
and less bulky—hence improving the airflow—and supports a scalable and
future-proof deployment of server racks inside the data center. Each optical
fiber link consists of a driver modulating a light source at the transmitter
side, converting electrical data into light. After propagating through a SMF
or MMF interconnect, light is converted back to an electrical voltage at the
receiver side by a photo detector cascaded with a transimpedance amplifier
(TIA).
Since all these facilities together consume up to 3% of the global electricity
production [7], the ecological impact can not be neglected. Hence, there is a
strong need to investigate scalable and energy efficient or ‘green’ alternatives
for interconnections inside data centers, to cope with the information capacity
requirements of the future.
1.2 Data center intraconnects targeting 400Gb Eth-
ernet
Data center traffic is mostly based on the Ethernet packet switch protocol and
is currently transitioning from10G/40G to 25G/100G and 100G/400G (server-
to-switch speed/switch-to-switch speed) [8]. Directly binary modulated
850 nm VCSEL-based optical links transmitting over MMF are prevailing
the data center market today. Pluggable transceivers operating at 25Gb/s
non-return-to-zero (NRZ) are commercially available. However, modal dis-
persion1 limits the transmission distance to 100m across OM4 MMF while
assuming Reed-Solomon (RS)(528,514) forward error correction (FEC) at
the host, and reduces to 40m excluding FEC [9, 10]. The aggregate data
rate is increased to 100Gb/s by providing four parallel 25Gb/s lanes: four
parallel MMFs combined in a single multi-path push-on (MPO) connector
as described in the 100GBASE-SR4 standard or four different wavelengths
on a single MMF using short-wavelength division multiplexing (SWDM) as
defined by 100GBASE-SWDM4 [8]. The reach could be extended to 300m
if the data rate is lowered to 10Gb/s, but necessitates the use of ten parallel
lanes to achieve the same 100Gb/s capacity. A better alternative to enhance
1Due to the large core diameter of 50 µm in a MMF, multiple modes propagate at different
velocities in the fiber. These modes combine back at the receiver side, creating a broadened
pulse. The associated intersymbol interference (ISI) will limit the maximum achievable data
rate.
Introduction 5
Table 1.2: Overview of the proposed 400Gb Ethernet standards for short-reach and long-
reach data center intraconnects where the raw lane rate takes into account the overhead
associated to FEC and 256B/257B transcoding [11].
Standard Rate (GBd) λ (nm) Link (m) FEC
400GBASE-SR16 16 × 25.78 840-860 100 KR4
400GBASE-FR8 8 × 26.56 1273-1309 2000 KP4
400GBASE-LR8 8 × 26.56 1273-1309 10000 KP4
400G-PSM4 4 × 53.13 1304.5-1317.5 500 KP4
the link coverage is transitioning to SMF, 100Gigabit Ethernet (GbE) solu-
tions again exploit four parallel fibers running at 25Gb/s (100G-PSM4) or
utilize four wavelengths in the O-band spectrum (100GBASE-CWDM4).
Link lengths up to 2 km are guaranteed when considering RS (528,514)
FEC (also called KR4 FEC) at the host, with a pre-FEC bit-error ratio
(BER) of 1.4 × 10−5. The biggest downside of SMF adaptations however
is cost effectiveness, which is dominated by the tighter optical alignment
requirements (9 µm core for SMF versus 50 µm for MMF) and the integration
of edge-emitting distributed-feedback (DFB) lasers instead of VCSELs.
The demand for more data transmission capacity forces the vendors and
data center operators to explore and define new technologies and standards.
The 400Gigabit Ethernet standard, which is defined by the IEEE P802.3bs
400GbE Task Force, will need to utilize multiple advanced techniques
to boost the single lane rate in an attempt to maintain a similar port
density. Multi-level intensity modulation, such as four-level pulse amplitude
modulation (PAM-4) effectively doubles the data rate and allows reusing
existing 25G technology distributed across eight wavelengths in the O-band
as a preliminary solution. This approach targets links up to 2 km and
10 km and is defined in the 400GBASE-FR8 and 400GBASE-LR8 standards
respectively. Another method referred to as 400G-PSM4, pursues 50GBd
up to 500m utilizing four parallel SMFs. All long-wavelength 400GbE
interconnects consider RS(544,518), known as KP4 FEC with a pre-FEC
BER of 2.2 × 10−4, to achieve in the end a BER≤ 10−13 at the media access
control sublayer [11]. The short-reach MMF standard 400GBASE-SR16
drives parallelization even further, i.e. up to sixteen lanes of 25Gb/s and
seems like a short-term solution. This time, KR4 FEC is apprehended to
reach 100m across OM4MMF. Table 1.2 summarizes themain specifications
of the proposed 400GbE standards.
6 Chapter 1
1.3 Recent advances in VCSEL-based transmitters
1.3.1 Short- and long-wavelength VCSELs
Although directly modulated edge-emitting lasers, such as the DFB laser,
are the commodity transmitter in long-wavelength data center transceivers
up to 100GbE, they aren’t necessarily the preferred technology for next gen-
eration interconnects. Characteristics that play in their favor are reliability,
precise wavelength tuning, a narrow linewidth and maintaining performance
at elevated substrate temperatures; e.g. reaching an optical output power
of 5mW and a bandwidth of 20GHz at 70 ◦C [12]. Unfortunately, these
high-speed devices are associated with threshold currents of around 10mA
and operating currents varying between 70mA and 150mA [12–14]. Aside
from focusing on the modulation bandwidth, reducing the threshold and
bias current of a DFB laser is another important research topic. Devices
with a threshold current of 0.21mA and a bias current of 1.08mA have
shown a modulation rate of 15Gb/s [15], while [16] achieved 40Gb/s using
a bias current of only 15mA. This corresponds to a wall-plug efficiency2 of
3% [16] and 10.6% [15]. In contrast, a state-of-the-art short-wavelength
(850 nm) VCSEL is capable of achieving around 24% [17].
Hence, it is worthwhile investigating the potential of long-wavelength VC-
SELs for long-reach data center intraconnects. Aside from the attractive
energy efficiency; wafer-scale fabrication and testing, as well as dense 1D/2D
array configurations, are all properties that pave the way to a cheaper product
than their edge-emitting counterpart. At this stage, long-wavelength VCSELs
above 10Gb/s are not yet commercially available because the technology
is not mature enough to satisfy all stringent data communication require-
ments [18]. The past years, a lot of research was invested in improving the
high-speed modulation performance of long-wavelength VCSELs. Espe-
cially the European FP7 framework projects MIRAGE and PHOXTROT, that
started at the end of 2012 and also provided funding for this thesis, greatly
contributed to the development of 1.5 µm VCSELs. In addition, the EU FP7
project RAPIDO which launched in 2014, focused on 1.3 µm VCSELs. The
common denominator between these projects is the company Vertilas that
originated as a spin-off from the Technische Universität München (TUM).
The group from prof. M.-C. Amann together with dr. M. Ortsiefer laid
the foundation for InP-based VCSELs implemented with a buried tunnel
junction (BTJ) [19]. InP-based materials, which are required for the active
2Optical power P divided by electrical input power I · V , of which the values are derived
from PIV -curves and modulation experiments.
Introduction 7
region to lase at wavelengths above 1.3 µm, cannot be grown on a GaAs
substrate. GaAs is typically used for short-wavelength devices and has the
advantage that it accommodates highly reflective and thin Bragg mirror pairs.
Basically, the BTJ concept resolves some thermal problems and enables the
integration of strong reflecting materials on an InP substrate [18]. In contrast,
prof. E. Kapon from EPFL Switzerland pursues another methodology that
enables the integration of GaAs-based mirrors together with an InP-based
active region on a GaAs substrate. This is made possible by applying a
technique called wafer fusion or wafer bonding. The inventors claim that
this process can result in a high-yield fabrication because the mirrors and the
active region can be separately optimized. However, the fabrication process
is complex as it requires several epitaxial growth steps on different wafers
before they are bonded together. The bonding interfaces can also intro-
duce extra resistivity leading to larger drive voltages and heat dissipation [18].
Historically, the research on improving the modulation bandwidth of short-
wavelength GaAs VCSELs greatly proliferated by the industrial interest.
This is reflected in Fig. 1.3 which shows that they are still the fastest surface-
emitting lasers presented in academia, characterized by a bandwidth of
30GHz. Although the long-wavelength alternatives are lagging behind, they
have evolved rapidly the past couple of years. Especially the 1.5 µm BTJ
lasers from TUM, developed during the MIRAGE project, showed great
improvement with a reported bandwidth up to 22GHz. At the wavelength of
interest for Ethernet applications, i.e. 1.3 µm, the InP-based BTJ VCSELs are
outperforming the wafer-fusion alternatives with a bandwidth of 17GHz and
11.5GHz respectively. It is expected that the experience gained by TUM and
Vertilas to enhance the modulation speed of 1.5 µm InP-based BTJ VCSELs
can trickle down to 1.3 µm devices, making them of significant interest for
future Ethernet standards. It should be clarified though that maximizing the
modulation bandwidth does not always lead to the best data transmission
experiments targeting a certain data rate, as a peaked response can give rise
to excessive jitter and overshoot. Therefore, recent design efforts are rather
focusing on optimizing the photon life time to control the damping factor
of the intrinsic response [20]. Reducing the internal resistance and heat
generation are also important design aspects that are necessary to enlarge
long-term reliability and manufacturability [21].
8 Chapter 1








     
  G
% 
E D
Q G
Z L
G W K
  *
+ ]
 
<HDURISXEOLFDWLRQ
QP P P
Figure 1.3: Evolution of the modulation bandwidth of VCSELs targeting MMF and SMF
applications [17, 20, 22–38].
1.3.2 VCSEL drivers
As the title of this dissertation already suggests, the main objective of the
past five years compromised the design of high-speed VCSEL drivers com-
missioned by the EU FP7 projects MIRAGE and PHOXTROT. Both projects
relied on 1.5 µmVCSEL-based transceivers embedded in a three-dimensional
optical engine to deliver chip-to-chip, board-to-board and rack-to-rack in-
terconnects. Aggregate data rates in the order of Tb/s were pursued by
exploiting parallelization through wavelength division multiplexing (WDM)
and spatial division multiplexing (SDM) using multi-core SMF. The lane rate
was initially specified at 25-40GBd NRZ modulation for PHOXTROT and
25-40GBd PAM-4 modulation for MIRAGE. Hence, the driver electronics
needed to support these data rates while featuring a multi-channel die layout
matched to the pitch of the VCSEL array. As can be noticed from Fig. 1.3,
the modulation bandwidth of 1.5 µm VCSELs dating back before 2012 (start
of both projects) are below the Nyquist bandwidth of 20GHz, which is
insufficient for 40GBd operation. To compensate the ISI introduced by the
bandwidth limitation, feed-forward equalization (FFE) was incorporated in
the driver electronics. Each project allowed two subsequent design iterations
for the PAM-4 driver and the NRZ driver. They were all fabricated in
a 0.13 µm SiGe BiCMOS technology, which was the fastest technology
Introduction 9









       
' D
W D 
U D
W H 
 *
E  V
  
<HDURISXEOLFDWLRQ
15=00) 3$000) 7KLVZRUN
15=60) 3$060)
Figure 1.4: The relevance of the presented work is put in perspective when comparing it
with the state-of-the-art in VCSEL-based transmitters [28, 39–56].
available for our lab at that time. The high-speed bipolar transistors were
characterized by a transition frequency ( fT ) of 230GHz and an open base
collector-emitter breakdown voltage of 1.6V.
The state-of-the-art in recent VCSEL driver integrated circuits (ICs) indepen-
dent of the transistor technology, is plotted in Fig. 1.4. A distinction is made
between short-wavelength (MMF) and long-wavelength (SMF) applications
and between NRZ and PAM-4 modulation. The majority of applications
emphasizes on 850 nm NRZ implementations demonstrating how FFE is
capable of extending the data rate with bandwidth limited VCSELs. Record
breaking results were achieved by IBM such as 64Gb/s across 57m of OM4
MMF and 71Gb/s in a back-to-back (BTB) link [51, 52]. Other design
approaches attempted to minimize power consumption, reaching energy
efficiency numbers down to 0.77 pJ/bit [49]. The transition to PAM-4 VC-
SEL drivers started appearing in 2015, with the first entry being the 40Gb/s
1.5 µm driver developed in the first design iteration during this PhD [46].
Previously, only PAM-4 experiments with pulse pattern generators directly
modulating the VCSEL were demonstrated, such as [35]. Subsequently,
PAM-4 drivers without FFE showed great performance electrically up to
90Gb/s and 100Gb/s, but degraded to 40Gb/s and 56Gb/s when tested in
an optical assembly using short-wavelength VCSELs. The second generation
10 Chapter 1
driver IC for MIRAGE was equipped with a 4-tap symbol-spaced FFE to
extend the data rate to 56Gb/s [45]. Even though, multi-level modulation is
able to achieve higher lane rates compared to the majority of NRZ imple-
mentations and is less susceptible to dispersion induced by the fiber [46],
FEC is typically required for error-free operation (BER< 10−12). According
to IBM, the use of FEC introduces a latency that is not acceptable for
high-performance computing applications. They made a statement against
the adoption of PAM-4 in data centers by reporting an error-free FEC-less
NRZ driver operating at 56Gb/s across 2 km of dispersion-shifted fiber [53].
The NRZ driver developed for PHOXTROT also showed error-free FEC-less
performance at 40Gb/s over 500m of standard SMF using a 1.5 µm VCSEL
and over 4.5 km driving a 1.3 µm VCSEL [28, 47]. Especially the last
experiment reveals the potential of 1.3 µm VCSEL transmitters to serve as
an energy efficient and scalable data center intraconnect compatible with the
emerging 400GbE standards.
1.4 Outline of the dissertation
Throughout this dissertation, several aspects concerning the development
of the respective high-speed VCSEL drivers are covered, which can also
be integrated in a more general design flow for laser drivers. First of all,
an equivalent simulation model will be created of the load, being a 1.5 µm
InP-based VCSEL with BTJ of which the model parameters are fitted to
small-signal measurements and PIV -curves. In the end, a comprehensive
model is created based on rate equations that accurately matches the non-
linear behavior of the VCSEL. The next chapter deals with the electrical
interface between the drivers’ output stage and the VCSEL in order to
optimize the complete electro-optic modulation response. The entire design
space determined by the drivers’ output impedance, interconnect inductance,
on-chip equalization and bias current of the VCSEL is explored to find the
optimal trade-off between performance and power dissipation. The last two
chapters elaborate on the integrated circuit design of the dual-channel PAM-4
and quad-channel NRZ driver developed as part of the EU FP7 projects
MIRAGE and PHOXTROT. A designmethodology for retiming flip-flops and
a generic reusable current mirror was designed in light of the symbol-spaced
4-tap FFE PAM-4 output stage. In addition, multiple techniques are described
to compensate the non-linear behavior of the VCSEL. A common property
of cathode-drive VCSEL drivers is the presence of multiple supply voltages.
A concept and architecture is proposed that enables the VCSEL driver to
run of a single supply voltage while preserving high-speed performance.
This concept was also filed as a patent application [57]. Both chapters
Introduction 11
end with a summary of the conducted experiments published in conference
proceedings and journal papers. Finally, the dissertation concludes with a
critical overview of the presented work and a discussion of the applicability
in next generation data center intraconnects.

2
Modeling long-wavelength VCSELs
The first step in the design of a VCSEL driver is an analysis of the static and
dynamic behavior of the VCSEL. This analysis involves examining the PIV -
curves together with the small- and large-signal modulation performance.
Afterwards, it becomes possible to create an equivalent model useful for
simulating the cascaded response of the driver and the VCSEL. This chapter
starts with a description of the structure of a long-wavelength VCSEL with a
BTJ, which allows mapping the electrical parasitics to the physical structure.
The next step comprises the modeling of the intrinsic operation of the
VCSEL, which essentially takes care of the electro-optic conversion. Various
approaches will be discussed that attempt to match the dynamic behavior of
characterized VCSELs fabricated by the TUM.
2.1 Structure of long-wavelength VCSELs
The long-wavelength VCSELs provided by TUM (and Vertilas) for this
thesis, were designed for operation at 1.5 µm. Since the wavelength of a laser
diode depends on the bandgap energy Eg of the semiconductor material, an
InP-based platform was most appropriate. Earlier implementations of these
type of VCSELs suffered from thermal issues degrading the performance
and reliability. Fortunately, incorporating a BTJ inside the device makes it
possible to replace p-doped layers with n-doped layers. As the electrical
resistance was previously dominated by the p-doped layers, the transition to
14 Chapter 2
Figure 2.1: (a) Cross sectional view [58] and top view [59] of an InP-based 1.5 µm VCSEL
with a BTJ and a double-mesa structure. (b) Diode-equivalent network of the junction layers.
mostly n-doped layers greatly reduces the Joule effect [18]. A cross sectional
view of an InP-based 1.5 µmVCSEL can be observed in Fig. 2.1a. Amplified
stimulated emission occurs in the active region which is composed of seven
6.1 nm thick AlGaInAs quantum wells grown with 1.2% compressive strain.
The active region is sandwiched between an overgrown n-doped InP layer
and a highly p-doped AlInAs cladding, creating a double heterojunction
structure. The optical cavity resides between two dielectric distributed Bragg
reflectors (DBRs). The top DBR consists of 5 pairs of AlF3/ZnS dielectric
layers and the bottom DBR of 3.5 pairs of AlF3/ZnS-Au metallic-dielectric
hybrid layers. Reflectivity is 99.4% and 99.9% respectively leading to a
top-illuminated laser.
The BTJ is located at the degenerate p+-AlGaInAs-n+-GaInAs interface and
is equivalent to a tunnel diode. Due to the orientation of the tunnel junction
with respect to the active region, the tunnel diode is reverse biased when
forward biasing the VCSEL, as drawn in Fig. 2.1. In this region, the tunnel
junction behaves as a resistor, see Appendix A.2 for more information about
the tunnel diode operating regions. The circular BTJ with diameter dBTJ
is surrounded by a reverse biased one-sided abrupt pn-junction, composed
Modeling long-wavelength VCSELs 15
of p+-AlGaInAs and n-InP with a diameter of dmesa. The combination of
the tunnel junction and the blocking p+n-structure results in an excellent
lateral current confinement with almost three orders of magnitude difference
in current density, thereby greatly improving the efficiency of the stimulated
emission [18]. Since the DBRs are electrical insulators, the current flows
radially from anode to cathode where it is confined in the active region by
means of the BTJ and the quantum wells. The electrical resistance is reduced
by incorporating highly doped current spreading layers inside the overgrown
InP layer. The anode is implemented as a gold substrate that effectively
acts as a heat sink and serves as the common reference for VCSEL arrays.
Together with the cathode, they are both accessible for bonding from the top
of the device through coplanar contact pads, see Fig. 2.1. Because of the
common-anode structure, this type of VCSEL arrays can only be driven via
the cathode.
The semiconductor cavity is electrically isolated from the contact pads
and from other cavities in an array configuration, by embedding it in
benzocyclobutene (BCB). Opting for BCB as insulator is also a way to
reduce the cathode pad capacitance because of the low dielectric constant
(r = 2.65). Another approach that was developed by TUM to improve high
speed performance is the double-mesa structure, of which Fig. 2.1 is an
example. As the blocking p+n-junction introduces a depletion capacitance
proportional to pi(d
2
mesa−d2BT J )
4 , electrical bandwidth can be drastically
improved by reducing the the bottom mesa diameter [60].
2.2 Mapping an electrical network to the physical
structure
The VCSEL can be represented by a 2-port black box consisting of two
cascaded subsections: electrical device parasitics and the active region
corresponding to the optical cavity. The total frequency response of the
VCSEL equals the convolution of the transfer functions of each cascaded
subsection [61, 62]. In order to achieve a good fitting between the modeled
VCSEL and the measurements, it is important to map the elements of the
network to the physical structure of the device. Moreover, it can give insight
to the photonic device designer about the individual parasitics, hinting poten-
tial improvements. Unfortunately, a lot of different networks can be observed
when consulting papers discussing the modeling of high-speed laser diodes.
The mapped network is depicted in Fig. 2.2 and is based on [29,63] which
specifically discuss 1.5 µm InP-based BTJ VCSELs. Because of the BTJ
structure, the component values and especially the bias-dependent behavior
16 Chapter 2
Figure 2.2: Cross sectional view of an 1.5 µm InP-based BTJ VCSEL is annotated with the
electrical lumped network together with the confined radial current flow through the active
region.
greatly differs from e.g. oxide-confined short-wavelength VCSELs [64].
The optical cavity corresponds to a damped second-order system with a
roll-off of 40 dB/decade and can be represented by a shunt RLC-circuit [62].
The RC time constant corresponds to the bias-dependent junction resistance
(Ra) and the sum of the depletion and diffusion capacitance (Ca) whereas
the photon storage is represented by a lossy inductor (R0, L0). The total
resistance seen by the radial current flow is a combination of the resistance of
the tunnel diode (Rm) in series with the resistance from the current-spreading
layers in the n- and p-doped layers (Rsp). In parallel with Rm is the mesa
depletion capacitance, resulting from the reverse-biased p+n-junction, and
the junction capacitance of the BTJ. The sum of both capacitances is repre-
sented byCm′ . The cathode pad, measuring approximately 80 µm by 145 µm,
creates a shunt capacitance (Cp) to the anode while the resistive and inductive
losses are assumed to be negligible.
The photon-electron interaction is modeled by cascading the active region
with the electrical parasitics, which is sufficient if the single purpose is to
acquire element values based on small-signal measurements. However, if the
network would function as a load for a driver circuit, then some modifications
are necessary to set up reliable simulation experiments. First of all, the
shunt RLC-circuit equivalent to the active region is not physically present.
Hence, there is no direct interaction between the output impedance of the
Modeling long-wavelength VCSELs 17
Figure 2.3: Electrical lumped small-signal model that can be fitted to S11 and S21 measure-
ments for a specific DC current Iv .
driver and the RLC-circuit. The direct photon-electron interaction can be
isolated from each other by inserting a current-dependent generator in the
active region of the network [64]. This requires that the pn-heterojunction
(Ra, Ca) is present both at the electrical and optical side of the model in
order to function electrically as a diode and optically as a resonator.
Since the BTJ is in series with the active region, it is very difficult to obtain
values for Cm′ , Rm, Ra and Ca that follow a deterministic trend, unlike [63].
According to [27,29], the depletion capacitance from the current-blocking
diode (Cm′) and the BTJ resistance (Rm) are responsible for the dominant
pole of the parasitic electrical network. Indeed, by converting the Z-matrix of
the electrical parasitic network for an InP-based BTJ VCSEL to an S-matrix
and comparing it to the S11-measurement, it can be concluded that the active
region has a marginal impact on the input reflection coefficient because of
the relatively small AC-impedance [65]. The series combination of Ra and
Rm is therefore shunted with an aggregate capacitance Cm. The individual
effects of Cm′ and Ca are hereby approximately captured by a single value
which simplifies the fitting procedure. In order to make the network easily
transformable to a large signal model, Ra corresponds to the dynamic
resistance of the heterojunction. The modified equivalent electro-optical
(E/O) network is drawn in Fig. 2.3. The electrical part is isolated from
the optical resonator in the active region by means of a current-dependent
voltage source connected to a series RLC-circuit, similar to [49].
18 Chapter 2
2.3 Extraction of component values
2.3.1 Electrical access network
The S-parameters provided by TUM cover the frequency range from 45MHz
to 26.5GHz. They were characterized with a u2t Photonics photodetector,
which introduces a negligible loss of 0.4 dB at 26.5GHz and the receiver is
therefore considered transparent for the extraction. The electrical network of
Fig. 2.3 is implemented in Matlab using RF-circuit objects, supporting auto-
matic analysis of S-parameters. The S11-matrix is converted to a Z11-matrix
before the component values are extracted using the built-in non-linear
least-squares curve-fitting solver function lsqcurve f it. The S-Z-matrix con-
version is merely chosen for clearer visualization of extraction discrepancies.
Faster convergence of the fitting tool is realized by inserting boundaries
on the component variables and specifying typical values. The default
trust-region reflective algorithm of lsqcurve f it was used which implies that
the number of function values (Z11) must be at least the amount of input
variables (Rsp, Cm, Rm+Ra). Since Z11 is evaluated for each frequency, this
condition is certainly met. Remark that Cp does not occur in the list of
input variables, because the pad capacitance can be estimated based on the
parallel-plate capacitance formula and is assumed independent of the bias
current. Cathode pad dimensions of approximately 80 µm by 145 µm and a
3.5 µm thick BCB insulator with an r of 2.65 lead to 78 fF.
A 4 µm-BTJ VCSEL, with threshold current of 0.8mA and roll-over current
of 9mA, is used to demonstrate the fitting results from the electrical model to
Z11 experiments. Figure 2.4 shows good agreement for two bias currents of
2.9mA and 6mA. The overall resistance decreases as the current through the
VCSEL increases. This is within expectations since the current though the
active region has an exponential relationship with the bias voltage, resulting
in a dynamic resistance inversely proportional to the current. The same
trend is noticeable when evaluating the individual component values of the
network, i.e. Rm, Ra, Cm and Rsp versus current Iv , shown in Fig. 2.5. More
specifically, the capacitance decreases from 316 fF to 211 fF whereas the
combined resistance Rm + Ra drops with 28% and Rsp with 50%. The
inverse relationship of Cm with the bias current confirms the hypothesis that
the depletion capacitance of the reverse-biased p+-n-junction is responsible
for the dominant pole of the electrical access network, since a depletion
capacitance is inversely proportional to the reverse bias voltage. Overall, it
can be concluded that the electrical bandwidth fp, defined by the transfer
function iavs , is proportional to the current.
Modeling long-wavelength VCSELs 19
0 5 10 15 20 25 30
−40
0
56
65 2.9mA
6mA
Re(Z11)
Im(Z11)
Frequency (GHz)
Z
11
(Ω
)
simulation
measurement
(a)
2.9mA6mA
0.
2
0.
5
1.
0
2.
0
5.
0
+j0.2
-j0.2
+j0.5
-j0.5
+j1.0
-j1.0
+j2.0
-j2.0
+j5.0
-j5.0
0.0 ∞
(b)
Figure 2.4: (a) Z11 curves at bias currents Iv of 2.9mA and 6mA show good agreement
below 20GHz with measured curves of a 4 µm-BTJ VCSEL. (b) Representation on Smith
chart shows possible calibration issues in the upper frequency range.
1 2 3 4 5 6 7 8 9
0
5
10
47
65
Rsp
Rm + Ra
Cm
VCSEL current (mA)
Re
sis
ta
nc
e(
Ω
)
211
316
Ca
pa
cit
an
ce
(fF
)
Figure 2.5: Resistive and capacitive elements of the VCSELmodel are inversely proportional
to the average current Iv .
An additional method to verify the reliability of the VCSEL model and
its corresponding fitting function, is to extract the mesa capacitance Cm
for multiple VCSELs with different mesa diameters dmesa. As already
mentioned in Section 2.1, a reduction in mesa diameter should lead to a
smaller Cm and therefore also a higher bandwidth. Three different cases
are plotted in Fig. 2.6 where the diameter ranges from 16 µm to 28 µm. As
expected, the capacitance drops for smaller diameters with a corresponding
gain in bandwidth. Remark that these double-mesa VCSELs have a longer
cavity than the one examined in Fig. 2.4 and exhibit a larger roll-over current,
hence the wider current range on the horizontal axis.
20 Chapter 2
0 5 10 15
200
400
600
800
VCSEL current (mA)
M
es
a
ca
pa
cit
an
ce
C
m
(fF
) dmesa=16 µm
dmesa=20 µm
dmesa=28 µm
(a)
0 5 10 15
10
20
30
40
VCSEL current (mA)
El
ec
tr
ica
l3
dB
-b
an
dw
id
th
f
p
(G
H
z)
dmesa=16 µm
dmesa=20 µm
dmesa=28 µm
(b)
Figure 2.6: (a) The hypothesis that Cm decreases with dmesa is validated by the VCSEL
model. (b) A gain in bandwidth is possible when opting for VCSELs with smaller mesa
diameters.
2.3.2 Optical resonator
Once the electrical access network has been fitted successfully to the S11
measurements, the optical resonator responsible for the intrinsic response of
the laser can be extracted from S21 measurements. The transfer function of
the VCSEL is a convolution of the electrical access response Hpar ( f ) with
the intrinsic response Hint ( f ). Unlike [29] where a first-order approximation
is used for Hpar ( f ), a more accurate method is to derive it directly from
the circuit model in Fig. 2.3. The intrinsic part Hint ( f ) is modeled as a
second-order response in Eq. (2.2) with relaxation oscillation frequency fr
and damping factor γ1 [29].
|Hv ( f ) | = |Hpar ( f ) · Hint ( f ) | (2.1)
= | ia
vs
· f
2
r
f 2r − f 2 + j γ2pi f
| (2.2)
The fitted VCSEL model corresponds quite well with the measured S21
response, as observed in Fig. 2.7. This was attained by averaging or omitting
some low-frequency content in order to better map the peaked response.
The extracted intrinsic parameters γ and fr together with fp can be use-
ful for the optical device designer to improve the design in various ways.
1The damping factor γ equals ωrQ , withQ representing the quality factor, and is expressed
in s−1.
Modeling long-wavelength VCSELs 21
107 108 109 1010 1011
−30
−20
−10
0
1.5mA
2.9mA
6mA
Frequency (Hz)
No
rm
al
ize
d
|S 2
1
|(d
B)
simulation
measurement
Figure 2.7: Normalized S21 curves at Iv of 1.8mA, 2.9mA and 6mA show good agreement
above 1GHz with fitted transfer function |Hv ( f ) |.
1 3 5 7 9
0
5
10
15
20
25
30
Ivcsel (mA)
Fr
eq
ue
nc
y
(G
H
z)
fr
fp
0
80
160
240
In
tr
in
sic
da
m
pi
ng
γ
(n
s−
1 )
(a)
0 4 8 12 16
0
5
10
15
20
25
30
Ivcsel (mA)
Fr
eq
ue
nc
y
(G
H
z)
fr
fp
0
80
160
240
In
tr
in
sic
da
m
pi
ng
γ
(n
s−
1 )
(b)
Figure 2.8: Extracted damping γ, electrical bandwidth fp and relaxation resonance fr for
(a) USC VCSEL and (b) SC VCSEL with dmesa = 20 µm and dBTJ = 4 µm.
TUM experimented with two ways to enhance the intrinsic response: either
shortening the cavity length from 3 λ to 1.5 λ, thereby lowering the photon
lifetime or increasing the compressive strain of the active region. Both are
further referred to as a ultra-short-cavity (USC) and short-cavity (SC) design
respectively [66]. Both designs are compared in Fig. 2.8 and exhibit similar
electrical bandwidth and relaxation resonance up to 10mA. However, the
SC device experiences more damping, making it particularly interesting for
multi-level modulation.
Ultimately, both techniques applied in the SC and USC design can be
combined to extend fr even further, causing the electrical parasitics to be
the main bottleneck. However, implementing a double-mesa structure can
reduce this limitation by boosting fp. This approach leads to a theoretical
bandwidth of 26GHz [66], which is probably the maximum achievable
22 Chapter 2
performance of long-wavelength InP VCSELs targeting output powers above
1mW at room temperature.
2.4 Large signal model
Although fitting the VCSEL to a small-signal lumped network can assist in a
better understanding of the cause of the bandwidth limitations, expansion to
a large-signal model is required for reliable VCSEL driver simulations. First
of all, the electrical behavior of the model needs to match that of a Shockley
diode. Secondly, non-linear modulation characteristics such as asymmetric
rise and fall time and/or overshoot are to be included. Various approaches
will be discussed and validated through static and dynamic experiments.
2.4.1 Shockley diode
The static I-V characteristics of a VCSEL can be approximated by inserting a
voltage source equal to the forward threshold voltage in series with the anode
of the VCSEL. However, this method is only valid in the linear region when
the device is already forward biased. Operation near threshold requires the
insertion of a Shockley diode that effectively models the pn-heterojunction
in the active region [64]. Resistor Ra from Fig. 2.3 represents the dynamic
resistance of the diode and is replaced with the diode itself in the modified
circuit of Fig. 2.9.
Solving Kirchoff’s voltage law on the electrical access network results in
Eq. (2.3) where the equivalent resistance Rma corresponds to the extracted
value Rm + Ra from previous Z11 experiments. Therefore, Ra needs to be
subtracted from the total resistive voltage drop leading to Eq. (2.4) after
insertion of the Schockley diode voltage equation.
Figure 2.9: Electrical lumped model expanded with Schockley diode Da and current-
dependent network elements to emulate static I-V behavior.
Modeling long-wavelength VCSELs 23
2 3 4 5 6 7 8 9 10
5
20
40
60
80
Ra
Rm
Rsp + Rma
VCSEL current Iv (mA)
Re
sis
ta
nc
e
(Ω
)
measurement
simulation
(a)
0 2 4 6 8 10 12
0.8
1
1.2
1.4
1.6
VCSEL current Iv (mA)
Vo
lta
ge
ac
ro
ss
VC
SE
L
V
v
(V
)
measurement
simulation
(b)
Figure 2.10: (a) Equivalent VCSEL series resistance and the individual contribution of Ra
and Rm. (b) Static I-V curves fitted using parameters n = 2.107 and Is = 66 pA for diode
Da .
Vv = Va + Ia ·
(
Rma + Rsp − Ra
) Rma=Rm+Ra (2.3)
= n · Vt ln
(
Ia
Is
+ 1
)
+ Ia ·
(
Rma + Rsp
)
− n · Vt Ra= n·V tIa (2.4)
Thermal voltage Vt is assumed to be 26mV, corresponding to room tem-
perature operation. The heterojunction ideality factor n and saturation
current Is are found after fitting the modeled Vv to actual Iv-Vv experiments.
Unfortunately, resistances Rma and Rsp are not extracted below threshold
from the Z11 characterization. Therefore, they need to be extrapolated using
the inverse relation P0 + P11+Iv [64] where P0 and P1 are fitting factors. The
fitting results for the series resistance of the VCSEL and the static I-V
curves show good agreement with the measured results, as seen in Fig. 2.10.
Sufficiently above the threshold current of 1mA, Rm—calculated from
Rma − Ra—remains nearly constant while Ra keeps on decreasing. This can
be explained by the linear relationship between current and voltage for the
reverse biased tunnel junction, as shown Fig. A.4 in Appendix A, and by the
exponential relationship between current and voltage for the heterojunction.
The positive slope of Rm near threshold is rather surprising and it is not clear
whether this is caused by the lack of characterized Z11 data in this region or
if it can be allocated to a thermal effect. . . Anyhow, this uncertainty does not
really matter for high-speed modulation applications as the VCSEL needs to
be heavily biased for maximum bandwidth.
The electrical access network can further be non-linearized by finding the
24 Chapter 2
fitting factors P0 and P1 for Rsp and Cm by reusing the fitting function
P0 +
P1
1+Iv . Although not explicitly mentioned in literature, it is assumed in
this work that the bias-dependent behavior of the electrical access network
is caused by self-heating of the VCSEL. Since the impedance of the VCSEL
varies with ambient temperature [67], internal heating mechanisms such as
non-radiative carrier recombination inside the cavity (being proportional
with the bias current) and ohmic heating losses due to carrier transport are
likely to infer similar effects. Taking into account a thermal time constant in
the order of 1 µs [68] which is significantly slower than the modulation rate,
it is safe to conclude that the impedance of the VCSEL remains constant
regardless of the instantaneous value of the modulation current.
2.4.2 Non-linear optical resonator
One way to obtain a non-linear model using a reduced set of intrinsic device
parameters, is to start from the linearized rate equations and follow the
approach described by [49]. This leads to a second-order low-pass frequency
response like Eq. (2.2) of which the relaxation resonance frequency fr and
damping factor γ can be written in function of rate equation parameters. As
fr increases with the photon density, the rate of change is therefore also
proportional with the junction current Ia above threshold Ith and can be
simplified to
fr ≈ D
√
Ia − Ith . (2.5)
This rate of change is denoted as the D-factor. Likewise, the damping factor
γ is proportional to the square of the resonance frequency [29,49] as follows
γ ≈ K f 2r + γ0, (2.6)
with damping offset γ0 and slope K-factor. As already observed in Fig. 2.8
and in accordance with Eqs. (2.5) and (2.6), increasing the VCSEL current
shifts fr to higher frequencies but is limited in the end due to a higher γ.
This allows to extract the K-factor, D-factor and γ0 from the straight-line
approximation fitted to the parameters γ and fr originating from Eq. (2.2).
This extraction process is demonstrated in Fig. 2.11 for a USC VCSEL.
The next step involves expressing Po in function of η(Ia − Ith) for the
equivalent model of Fig. 2.10 and match it to a second-order frequency
response, leading to
Modeling long-wavelength VCSELs 25
0.5 1 1.5 2 2.5 3
5
10
15
20
25
30
√
Ia − Ith (mA1/2)
Re
la
xa
tio
n
re
so
na
nc
ef
r
(G
H
z)
(a)
0 100 200 300 400 500 600
20
60
100
140
180
(Resonance frequency fr)2 (GHz2)
Da
m
pi
ng
fa
ct
or
γ
(n
s−
1 )
(b)
Figure 2.11: Extraction of (a) D-factor=10.041GHz·mA−1/2 and (b) K-factor=0.254 ns and
γ0=10.265 ns−1 through straight-line approximation.
Po ( f )
η(Ia − Ith) =
f 2r
f 2r − f 2 + j f2pi γ
(2.7)
=
1
L0Ca
1
L0Ca
− f 2 + j f2pi R0L0
(2.8)
L0(Ia) =
1
4pi2CaD2(Ia − Ith) (2.9)
R0(Ia) = L0(K f 2r + γ0) (2.10)
=
K
4pi2Ca
+
γ0
4pi2CaD2(Ia − Ith) . (2.11)
The value of the optical resonator elements L0 and R0 thus depend on the
instantaneous VCSEL current and are found after filling in an arbitrary
value for capacitor Ca in Equations (2.9) and (2.11). The final step involves
implementing the current-dependent network in a Verilog-A model, of which
the code can be consulted in Appendix B.1. The Verilog-A model will be
simulated in Cadence Virtuoso under test conditions that emulate the lab
experiments. A 1.5 µm USC VCSEL from [69] was probed and modulated
with a 92GS/s arbitrary waveform generator (AWG) generating an NRZ or
PAM-4 pseudorandom bit sequence (PRBS) with an even run length equal
to 27. The input signal was pulse shaped to guarantee critically damped
transitions with a rise and fall time of 0.6UI. The optical output was captured
with an anti-reflective coated lensed fiber and received with a commercially
available 31GHz linear photoreceiver (DSC-R409). Inspection of the re-
26 Chapter 2
ceivers test report reveals that the frequency response experiences almost
2 dB of peaking. To provide a viable comparison between the measurements
and simulations of the VCSEL, the response of the photoreceiver will be
taken into account by cascading the optical output of the VCSEL with
an equivalent lumped network representing the receiver. The frequency
response of the equivalent receiver network Hrec ( f ) is plotted in Fig. 2.12.
Figure 2.12: Frequency response Hrec ( f ) of the DSC-R409 linear photoreceiver reaches a
bandwidth of 31GHz and experiences almost 2 dB of peaking. The response of a lumped
element network (solid line) was manually fitted to the measurement data from the test report
provided by the manufacturer (dotted line) with best effort.
Simulated and measured eye diagrams at 28Gb/s are shown in Fig. 2.13 for
a modulation swing of 500mVpp and bias currents of 6mA and 8mA. As
expected, the receiver introduces peaking, while increasing the bias current
results in a more dampened response. However, the non-linear behavior
of the model seems like an inverted version of the lab experiment, i.e.
overshoot is present on the falling edge instead of on the rising edge. This
inverted behavior is caused by the damping factor of the optical resonator
that proportionally and instantaneously changes with the modulation current.
Apparently, the dampening factor in a real VCSEL seems to experience
some sort of delay with respect to the modulation current, resulting in a
relaxation oscillation at the rising edges. An attempt was made to improve
the model by including this delay in the current-dependent optical resonator
components. Although then the overshoot did occur on the rising edge, this
method showed mixed results across a wide range of operating conditions
and will not be further elaborated.
It can be concluded that linearized rate equations to create a current-
dependent optical resonator can introduce non-linearities in a model, but is
not recommended for reliable simulations of VCSEL applications. Hence,
the next approach will resolve this problem by inserting the actual rate
equations in the circuit.
Modeling long-wavelength VCSELs 27
0
5
(a) Po
0
5
(b) Po ∗ hrec (c) Measured
0
5
(d) Po
0
5
(e) Po ∗ hrec (f) Measured
Figure 2.13: Eye diagrams at 28Gb/s of the non-linear optical resonator model simulated
with and without cascaded receiver hrec (plotted in mW) show bad correspondence with the
measurements, demonstrated at a bias current of 6mA (a)-(c) and 8mA (d)-(f)
2.4.3 Rate equations
The interaction between the carrier density and the photon density inside
the optical cavity determines the intrinsic static and dynamic response of
the laser and can be described with the coupled single-mode rate equations
Eqs. (2.12) and (2.13) [62]. The physical parameters for these equations are
explained in Table 2.1.
dS(t)
dt
= Γg0 · N (t) − N01 + S(t) S(t) −
S(t)
τp
+
ΓβN (t)
τn
(2.12)
dN (t)
dt
=
Ia (t)
qVa
− g0 · N (t) − N01 + S(t) S(t) −
N (t)
τn
(2.13)
The rate of change of photon density S depends on the carrier density
N and therefore also on the current Ia flowing through the active region.
The non-linear behavior of the optical gain G versus photon density ,
expressed as G(N, S) = g0 · N (t)−N01+S(t) , is modeled by the gain compression
factor S in the denominator. Photon losses occur through free carrier
absorption, mirror losses and scattering and are described by S(t)τp [70].
Besides stimulated emission, spontaneous emission is generated through
carrier recombination with ΓβN (t)τn . Carrier density decreases through
stimulated emission g0 · N (t)−N01+S(t) S(t) and non-radiative recombination N (t)τn .
The optical output power Po relates to the photon density as follows:
28 Chapter 2
Table 2.1: List of rate equation parameters
Symbol Dimension Description
S(t) m−3 photon density
N (t) m−3 carrier density
Ia (t) A current through active region
Ith A threshold current
N0 m−3 carrier density at transparency (G(N0, S) = 0)
Nth m−3 carrier density at threshold
 m3 gain saturation parameter
Γ — longitudinal optical confinement factor
g0 m3/s gain slope
τp s photon lifetime
τn s carrier lifetime
β — spontaneous emission factor
q C electron charge, 1.602 × 10−19 C
Va m3 volume of the active region
V m3 volume of the optical cavity
h Js Planck constant, 6.626 × 10−34 Js
ν Hz laser center frequency
λ m wavelength
hν J photon energy
η — total quantum efficiency
Po (t) W optical output power
B s−2A−1 new rate equation parameter, see Eq. (2.15)
τc s new rate equation parameter, see Eq. (2.16)
F A/W new rate equation parameter, see Eq. (2.17)
Ise A new rate equation parameter, see Eq. (2.18)
X (t) — normalized carrier density
P(t) =
Vaηhν
2Γτp
S(t) =
Vηhν
2τp
S(t). (2.14)
The longitudinal optical confinement factor Γ is the ratio of the active region
volume Va to the optical cavity volume V [70]. It is equivalent to the photon-
electron overlap and is < 1. The output power is proportional to the optical
cavity volume whereas the stimulated emission of photons benefits from a
Modeling long-wavelength VCSELs 29
better optical confinement. In total nine unknown parameters ( , Γ, g0, β, τn,
τp, η, N0, Va) are required to solve Equations (2.12) to (2.14). The number
of unknowns can be reduced by transforming the rate equations in order to
group the parameters into newly defined rate equation parameters [71]:
B =
Γg0
qVa
(2.15)
τc =

g0
(2.16)
F =
2qλ
hcη
(2.17)
Ise =
β
Bτnτp
(2.18)
Ith =
qVaNth
τn
=
qVa
τn
(
N0 +
1
Γg0τp
)
(2.19)
The steady state solution is obtained by setting the time derivatives to zero.
After some algebra [71], the following approximation appears
(
FP2o
)
− (Ia − Ith − Ise) FPo − Ise Ia ≈ 0. (2.20)
Solving Eq. (2.20) for output power Po and neglecting the negative root,
leads to
Po =
1
2F
(Ia − Ith − Ise) + 12F
√
(I − Ith − Ise)2 + 4Ise Ia . (2.21)
Po =
1
F
(Ia − Ith)Ia−IthIse (2.22)
Remark that the optical power becomes linear for high operating currents,
described by Eq. (2.22). As such, it does not take into account compression
of the optical power at the roll-over current caused by thermal effects.
Parameters F, Ith and Ise can be found by fitting the left side of Eq. (2.20) to
zero for values Po and Ia obtained from the PI-curve of a VCSEL operated
below the roll-over current, as demonstrated in Fig. 2.14.
The remaining newly defined rate equation parameters can be extracted from
the intrinsic response of the laser, modeled as a second order response equal
to Hint ( f ) from Eq. (2.2). It can be shown that the relaxation oscillation
30 Chapter 2
0 1 2 3 4 5
0
0.5
1
1.5
VCSEL current Iv (mA)
O
pt
ica
lp
ow
er
(m
W
)
measurement
simulation
Figure 2.14: Fitting result shows good correspondence with measured PI-curve of USC
VCSEL below roll-over current with parameters F = 2.74 A/W, Ith = 0.83 mA and
Ise = 410 nA
frequency fr and damping factor γ are equal to
fr =
√
B (Ia − Ith)
2pi
(2.23)
γ =
1
τn
+ K f 2r (2.24)
K = 4pi2
(
τp + τc
)
, (2.25)
assuming operation in the linear region of the PI-curve [71]. Equations (2.23)
and (2.24) are equivalent to Eqs. (2.5) and (2.6). Hence, solving the equations
for B and τn leads to
B = (2piD)2 (2.26)
τn =
1
γ0
, (2.27)
allowing to reuse the already extracted parameters K , D and γ0 plotted in
Fig. 2.11. With τn = 97 ps, B = 3976 GHz2/mA and τc + τp = 6.43 ps,
the new set of rate equations can be constructed. The photon density S(t)
and carrier density N (t) are hereby replaced with optical power Po (t) and
relative carrier density X (t) = N (t)Nth respectively.
Modeling long-wavelength VCSELs 31
Figure 2.15: Electrical diode access network cascaded with the transformed rate equations
that model the static and dynamic intrinsic behavior of a VCSEL.
dPo (t)
dt
=
BτnIth (X (t) − 1) + 1τp
1 + FBτpτcPo (t)
Po (t) − Po (t)
τp
+
Ise IthBτn
F
X (t) (2.28)
dX (t)
dt
=
Ia (t)
Ithτn
−
FBτp (X (t) − 1) + FIthτn
1 + FBτpτcPo (t)
Po (t) − X (t)
τn
(2.29)
The equivalent VCSEL model is constructed in Verilog-A by cascading the
electrical access network with Equations (2.28) and (2.29), see Fig. 2.15 and
Appendix B.2.
The last unknown parameter τp, which immediately sets τc through Eq. (2.25),
can be estimated by fitting the small-signal response from the model to the
S21 result for a certain bias current. Qualitative fitting occurs at a low bias
current showing a distinct relaxation oscillation peak. Although this extrac-
tion method differs from the more accurate approaches proposed in [71],
it does not require additional experiments. The fitting result is plotted in
Fig. 2.16 for a USC VCSEL biased at 2.1mA.
32 Chapter 2
0 4 8 12 16 20
−10
−7.5
−5
−2.5
0
2.5
5
Frequency (GHz)
M
ag
ni
tu
de
(d
B)
Measured
τp = 2 ps
τp = 2.6 ps
τp = 3.6 ps
Figure 2.16: The measured and simulated small-signal response of a USC VCSEL biased at
2.1mA shows best correspondence for a photon lifetime τp = 2.6 ps
The accuracy of the rate-equation based model for a USC VCSEL is
verified through the modulation experiments that were introduced in Sec-
tion 2.4.2. Again, the response of the photoreceiver is taken into account.
Figures 2.17 and 2.18 show NRZ and/or PAM-4 eye diagrams at 28GBd and
40GBd for bias currents of 6mA and 8mA using a modulation swing of
500 mVpp ≈ 8.5 mApp. The non-linear behavior monitored at the receiver
output hrec corresponds quite well with the measurements in terms of relax-
ation oscillation and asymmetry in the rising and falling edge. As expected,
the damping of the relaxation oscillation is proportional with the bias current.
Unfortunately, the PAM-4 signal looks overly optimistic experiencing less
ISI than the real VCSEL. Whether or not this is purely related to the VCSEL
should be further investigated by testing with other receivers. The peaked
frequency response of the receiver is capable of boosting the falling-edge
at the cost of an underdamped relaxation oscillation. This is particularly
detrimental for the opening of the upper eye in the PAM-4 signal.
The transformed rate equations combined with the electrical input network
are very capable of reflecting the static and dynamic behavior of the VCSEL.
Above all, parameter extraction is relatively fast and simple as it only requires
S-parameter measurements and PIV -data of the VCSEL. The application is
however limited to intensity modulation/direct detection systems because
the phase of the electric field was not extracted for this thesis. In addition,
temperature dependence is preferably included in the model as data center
Modeling long-wavelength VCSELs 33
0
5
(a) Po
0
5
(b) Po ∗ hrec (c) Measured
0
5
(d) Po
0
5
(e) Po ∗ hrec (f) Measured
0
5
(g) Po
0
5
(h) Po ∗ hrec (i) Measured
Figure 2.17: Eye diagrams at 28GBd of the model using transformed rate equations
simulated with and without cascaded receiver hrec (plotted in mW) match well with the
measurements, demonstrated at a bias current of 6mA (a)-(c) and 8mA with NRZ and
PAM-4 respectively (d)-(i)
0
5
(a) Po
0
5
(b) Po ∗ hrec (c) Measured
0
5
(d) Po
0
5
(e) Po ∗ hrec (f) Measured
Figure 2.18: Eye diagrams at 40Gb/s of the model using transformed rate equations
simulated with and without cascaded receiver hrec (plotted in mW) match well with the
measurements, demonstrated at a bias current of 6mA (a)-(c) and 8mA (d)-(f)
34 Chapter 2
interconnects are required to operate across a wide temperature range.
2.5 Conclusion
Constructing an equivalent model of a laser starts by mapping the electrical
access network to the physical structure of the laser. The model is then
completed by including an equivalent network for the intrinsic response, that
is fitted to a two-port S-parameter characterization. Not only is this helpful
in acquiring a deterministic relation between the component values and the
bias current, it can also gain insight in the limitations of the design. For
example, the impact on the electrical and intrinsic bandwidth of VCSELs
fabricated with different mesa diameters can be verified.
The small-signal equivalent of the VCSEL needs to be extended with non-
linear elements to optimize the design flow of the driver IC. A few approaches
are discussed, but in the end, the transformed rate equations combined with
the electrical diode input network are most effective in reflecting the static
and dynamic behavior of the VCSEL. Above all, parameter extraction is
relatively fast and simple as it only requires S-parameter measurements and
PIV -data of the VCSEL. A further enhancement to the model would be to
include a temperature dependence in the static and dynamic behavior, as
data center interconnects are required to operate across a wide temperature
range, with case module and substrate temperatures reaching 70 ◦C and 90 ◦C
respectively [9, 26, 72].
3
Co-optimization between the
driver-VCSEL interface and on-chip
feed-forward equalization
Data rates specified in emerging data communication standards tend to
surpass the bandwidth of state-of-the-art VCSELs. Furthermore, the
interconnection—be it a wire bond, a flip-chip solder bump or a trace
routed on an interposer or a printed circuit board (PCB)—can significantly
alter the modulation response of the transmitter, which makes the design of
the integrated driver circuit all the more challenging. Incorporating FFE,
choosing the optimal output impedance of the driver, changing the bias
current of the VCSEL, etc., can lead to the desired performance but results
in a large design space that needs to be explored. In this chapter, we propose
a design flow based on a small-signal model of the VCSEL that is capable of
speeding up the design process. More specifically, the output impedance and
the FFE topology are co-optimized for a given interconnection and a range
of VCSEL bias currents. Experimental verification using a 4-tap PAM-4
driver validates the effectiveness of the proposed flow.
36 Chapter 3
Figure 3.1: Lumped network representation of the driver output connected to the small-signal
model of a VCSEL.
3.1 Modeling the driver-VCSEL interface
The output stage of the driver can be modeled by its Norton equivalent
with output resistance Rd and current source Id. The current source will be
implemented as a finite impulse response (FIR) filter that aims to compensate
the response of the laser with respect to a certain data rate. The load
capacitance Cd is composed of the capacitance associated to the bond pad,
the electrostatic discharge (ESD) diodes and the parasitic capacitance of
the drive transistors. The interconnection is modeled as an inductance Ld
and represents inductive effects from e.g. a wire bond or a routed trace.
The small-signal model of the VCSEL is adapted from Fig. 2.3 and is
used in the complete E/O lumped network shown in Fig. 3.1. The driver-
VCSEL interface is characterized by the E/O transfer function Hvcd ( f ) = poid .
Figure 3.2 shows the normalized transfer function of a > 16 GHz TUM
VCSEL for different driver configurations. These VCSELs are in fact the
first generation of VCSELs in the EU FP7 project MIRAGE and were used as
the basis for the design of the PAM-4 driver described in Chapter 4. The first
configuration corresponds to a probed VCSEL driven by a 50W generator
(Cd = 0, Ld = 0), whereas the other configurations correspond to the output
stage of a driver wire bonded to the VCSEL (Cd = 140 fF, Ld = 600 pH). It
can be observed that the interconnection can introduce peaking and a steeper
roll-off while a higher drive impedance Rd counter-intuitively can extend
the bandwidth from 18 to 21GHz at the cost of stronger ringing. Reducing
the bias current of the VCSEL deteriorates the bandwidth and emphasizes
peaking in the frequency response.
A time domain approach will be used to determine the optimal tap coeffi-
cients that compensate Hvcd. Therefore, the transfer function needs to be
converted to an impulse response hvcd by calculating the inverse Laplace
transformation. The associated impulse response and step response are
displayed in Fig. 3.2. As expected, peaking in the frequency response leads
to ringing in the time domain. It can be observed that the driver parameters
Co-optimization between the driver-VCSEL interface and on-chip FFE 37
0.5 5 10 16 18 21 30 35 40
−39
−30
−20
−10
−6
−3
0
3
Frequency (GHz)
No
rm
al
ize
d
|H
v
c
d
|(
dB
)
Rd = 50Ω, Iv = 9.75 mA
Rd = 50Ω, Ld = 600 pH, Cd = 140 fF, Iv = 9.75 mA
Rd = 160Ω, Ld = 600 pH, Cd = 140 fF, Iv = 9.75 mA
Rd = 160Ω, Ld = 600 pH, Cd = 140 fF, Iv = 6 mA
(a)
0 50 100 150 200 250
−0.4
0
0.5
1
Time (ps)
Im
pu
lse
re
sp
on
se
h
v
c
d
(b)
0 50 100 150 200
0
0.5
1
Time (ps)
St
ep
re
sp
on
se
of
h
v
c
d
(c)
Figure 3.2: (a) Transfer function , (b) impulse response and (c) step response of a > 16 GHz
VCSEL biased at 6mA and 9.75mA for different driver configurations.
38 Chapter 3
0 50 100 150 200 250 300 350 400
20
30
40
50
60
70
80
90
100
110
Driver impedanceRd (W)
Se
ttl
in
g
tim
e(
ps
)
600 pH, 6mA 600 pH, 9.75mA
250 pH, 6mA 250 pH, 9.75mA
Figure 3.3: Settling time of hvcd plotted versus driver impedance Rd for an interconnect
inductance Ld of 250 pH and 600 pH and a VCSEL current Iv of 6mA and 9.75mA.
Rd, Cd and Ld together with the VCSEL current Iv significantly affect the
channel response hvcd. Ideally, fast rise times with minimal ringing are
preferred to maximize the link performance and limit the complexity of
the equalizer. In order to find the optimal set of driver parameters, the
step response is evaluated across a wide range of Rd for multiple Ld and
Iv scenarios. An interesting property of the step response is the settling
time1,because it encompasses the damping behavior as well as the transition
time. Figure 3.3 reveals that a low output impedance Rd is not necessarily the
best option because it can experience an underdamped oscillation governed
by a long settling time. Even though it is characterized by the fastest rise
time as shown in Fig. 3.4. The settling time becomes worse when the
inductance Ld increases or the VCSEL bias is reduced, as both actions result
in additional overshoot. Interestingly, the rise time can be dominated by the
interconnect inductance, showing almost no dependency on Rd and can be
exploited to speed up the rising edge even when implementing a high Rd.
The optimal step response requires a heavily biased laser driven with an
output impedance between 50W and 100W, while using a low-inductive
1The settling time is defined here as the time where the difference between the step
response y(t) and the steady state value y f inal (t) fluctuates within 5% of the steady state
value. The channel delay is de-embedded from the experiment by subtracting the time where
y(t) ≥ 0.01y f inal (t) from the settling time.
Co-optimization between the driver-VCSEL interface and on-chip FFE 39
0 50 100 150 200 250 300 350 400
15
20
25
30
35
40
Driver impedanceRd (W)
10
-9
0%
Ri
se
tim
e(
ps
)
600 pH, 6mA 600 pH, 9.75mA
250 pH, 6mA 250 pH, 9.75mA
Figure 3.4: Rise time of hvcd plotted versus driver impedance Rd for an interconnect
inductance Ld of 250 pH and 600 pH and a VCSEL current Iv of 6mA and 9.75mA.
interconnect. Unfortunately, not all drive current Id is delivered to the laser
because it is partially dissipated in the back-termination resistor Rd. Hence,
a drive efficiency factor m = Iv (0)Id (0) can be introduced that models the current
gain at DC, see Fig. 3.5. The optimal Rd mentioned before (between 50W
and 100W) is thus characterized by an efficiency between 40% and 62%.
The driver will need to generate a larger drive current to obtain a certain laser
current as m drops. Since the dimensions of the output drive transistors are
proportional to the current, the capacitive loading will likewise increase with
current. For this example, a 0.13 µm SiGe BiCMOS driver IC is assumed
that loads the output node with 40 fFm . This capacitance is in parallel with the
bond pad and ESD capacitance—estimated at 120 fF—and comprises the
complete driver capacitance Cd. The dependency of Cd on Rd is plotted in
Fig. 3.5. Remark that this dependency was included in the step response
simulations to guarantee equal laser drive conditions for all values of Rd.
The 10-90% rise time of the E/O response hvcd is around 20 to 25 ps,
generating ISI at 40Gb/s if this channel would be cascaded with a bandwidth-
limited driver and receiver. Hence, equalization of the response will be
required to reduce the rise time and dampen the ringing. In terms of energy
efficiency, it is worthwhile investigating whether or not equalization allows
utilizing a large back-termination resistor or running the VCSEL from a
40 Chapter 3
0 50 100 150 200 250 300 350 400
160
180
200
220
240
260
280
300
Driver impedanceRd (W)
Dr
iv
er
ca
pa
cit
an
ce
C
d
(fF
)
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Dr
iv
ee
ffi
cie
nc
y
m
Figure 3.5: Drive efficiency m = Iv0Id0 and driver capacitance Cd plotted versus driver
impedance Rd for a VCSEL current I¯v of 6mA.
lower bias current while still achieving 40Gb/s.
3.2 Equalization of the channel in function of the
driver impedance
In order to study the gain in performance after including an FFE in the
VCSEL driver, a system model of an optical link must be constructed to find
the FFE coefficients that minimize the ISI. The system shown in Figure 3.6 is
a cascade of the impulse response of the driver (prect (t)∗hgauss (t)∗h f f e (t)),
of the driver-VCSEL interface (hvcd (t)) and of the receiver (hrec (t)). The
data symbols a(k) transmitted at a symbol rate of 1T , are placed on rectangu-
lar pulses prect (t). These pulses are shaped with a gaussian filter hgauss (t)
to model the finite bandwidth of the driver circuit. The equalizer is a FIR
filter featuring NFFE tap coefficients C(n) spaced apart by a time delay of
T
M . The factor M determines whether the equalizer is a symbol-spaced or
fractionally-spaced filter. The conversion from electrical current to optical
power in the transmitter and vice versa in the receiver is linearized leading
to a channel response g(t) that comprises the whole link, excluding fiber
dispersion effects. Using a least squares solver2, the coefficients C(n) can be
2A built-in function of Matlab lsqcurvefit was used.
Co-optimization between the driver-VCSEL interface and on-chip FFE 41
Figure 3.6: Block diagram of a linearized BTB optical link to find the coefficients C(n) of
h f f e (t) using a least-squares solver that minimizes the ISI present in the received signal
r (k).
found that minimize the sum of the squared errors between the transmitted
and received symbols, Eq. (3.3) [73].
y
(
k
T
M
)
=
∑
n
a(n) · g
(
k
T
M
− nT
)
(3.1)
r (kT ) =
NFFE−1∑
n=0
y
(
kT − nk T
M
)
· C(n) (3.2)
min
C (n)
N∑
k
(r (k) − a(k))2 = min
C (n)
S (3.3)
MSE =
S
N
(3.4)
For the remainder of this chapter, we will restrict the FFE discussion to a
symbol-spaced equalizer such that it can serve as a basis for the design of the
PAM-4 driver from Chapter 4, which implements the tap delay with full-rate
flip-flops. In addition, the bandwidth of the Gaussian filter is set at 30GHz,
while the data rate amounts to 40Gb/s. The first simulation experiment
observes whether the mean square error (MSE) defined by Eq. (3.4) improves
equally for various driver impedances Rd while consecutively increasing the
number of symbol-spaced taps NFFE . The driver-VCSEL interface consists
of a 16GHz VCSEL biased at 6mA and is connected to the driver through
and inductance of 250 pH. Driver impedances of 20W, 50W, 125W and
375W are evaluated because they correspond to extrema in the settling time
of Fig. 3.3. For now, the receiver response is assumed flat with hrec = 1.
42 Chapter 3
1 2 3 4 5 6
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
NFFE
M
S
E
Rd = 20Ω Rd = 50Ω
Rd = 125Ω Rd = 375Ω
(a)
1 2 3 4 5 6
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
NFFE
M
ax
( |C(
n
)|)
Rd = 20Ω Rd = 50Ω
Rd = 125Ω Rd = 375Ω
(b)
Figure 3.7: (a) Increasing the depth of the equalizer NFFE allows to reduce the MSE down
to comparable levels for all driver impedances. (b) However this requires much stronger
coefficients C(n) for higher impedances.
Although a low driver impedance results in the smallest MSE unequalized,
Fig. 3.7 reveals that increasing the depth of the equalizer NFFE can bring
the MSE down to comparable levels for all cases. However this requires
much stronger coefficients C(n) for higher impedances. These coefficients
correspond to the magnitude of tail current sources when constructing the
equalizer in a current-mode logic (CML) topology. In case the driver is
DC-coupled to the VCSEL as in Fig. 3.1, these current sources together
with the bias current Ib will contribute to the average current through the
VCSEL I¯v. Therefore, C(n) can’t be indefinitely scaled when targeting a
specific modulation amplitude IM and operating current Iv for the VCSEL.
Figure 3.8 shows the PI-curve of the VCSEL under test annotated with the
current and optical power levels as well as an example of a 2-tap equalized
current waveform. Typically, the laser is biased as such to obtain the desired
frequency or step response characteristics and is modulated around this
operating point. Care should be taken to avoid modulating the laser current
below the threshold current Ith, otherwise a random turn-on delay combined
with a severe relaxation resonance will distort the signal [74]. Hence, the low
level current Iv0 should be located above Ith, a condition that can be satisfied
by adapting the bias current Ib. A parameter that effectively expresses how
close you are modulating the VCSEL near the point where it stops lasing (at
Ith), is the extinction ratio (ER). A typical VCSEL operates at an ER of 5 dB
for high-speed operation [46,47]. The analysis below will create a constraint
on the magnitude of C(n) with respect to the average VCSEL current I¯vand
the ER, using Fig. 3.8 as a guideline.
Applying equalization to a signal will either increase or decrease the modu-
Co-optimization between the driver-VCSEL interface and on-chip FFE 43
0 1 3.4 6 8.6 16
0
0.9
1.9
3
4.2
Po0
Po1
P¯o
Ith Iv0 Iv1I¯v
StaticDynamic
VCSEL current Iv (mA)
O
pt
ica
lp
ow
er
(m
W
)
(a) (b)
Figure 3.8: (a) Static and dynamic PI-curve of the VCSEL under test biased at an average
current I¯v of 6mA and modulated with an ER of 5 dB. (b) Example of a symbol-spaced
2-tap equalized current waveform denoting all the current levels assuming C(2) < 0 for
boosting and C(2) > 0 for dampening the transitions.
lation swing, depending on what part of the frequency response that needs to
be boosted. The final swing IM is found in Eq. (3.5) after summing all the
coefficients C(n) that are multiplied with the modulation current Im. The
contribution of the tap coefficients to the low-level current Iv0 is equivalent
to taking the absolute value of C(n), see Eq. (3.8).
IM = Iv1 − Iv0 = Im
NFFE−1∑
n
C(n) (3.5)
I¯v = Ib +
Im
2
NFFE−1∑
n
|C(n) | (3.6)
Iv1 = I¯v +
IM
2
(3.7)
Iv0 = I¯v − IM2 = Ib +
Im
2
NFFE−1∑
n
|C(n) | − C(n) (3.8)
Iv0 can also be written in function of the ER, which is given in Eq. (3.10).
Remark that the ER formula can be transformed from the optical power
domain to the electrical current domain, by reasoning on the dynamic
PI-curve from Fig. 3.8, see Eq. (3.9). This is a linear extrapolation above
the operating point and maintains the slope efficiency η related to I¯v. This
assumption is valid as long as the modulation rate is faster than the thermal
44 Chapter 3
bandwidth of the laser, preventing instantaneous heating of the active region
in the laser diode [70]. Before equating Eq. (3.10) to Eq. (3.8), Im2 must be
substituted first with its equivalent form of Eq. (3.11).
ER =
P1
P0
=
Iv1 − Ith
Iv0 − Ith
P=η( I¯v )(I−Ith ) (3.9)
Iv0 =
2I¯v + Ith (ER − 1)
ER + 1
(3.10)
Im
2
= I¯v − Iv0 =
2 (ER − 1)
(
I¯v + Ith
)
(ER + 1)
NFFE−1∑
n
C(n)
(3.11)
The remaining unknown variable in Eq. (3.8) is Ib. An extreme scenario can
exist where |C(n) | is entirely used for biasing the laser—a technique already
exploited in [75]—forcing Ib to zero. This implicitly infers a constraint on
|C(n) | dictated by the average laser current I¯v, ER and Ith, see Eq. (3.12).
NFFE−1∑
n
|C(n) | − C(n)
NFFE−1∑
n
C(n)
≤ 2I¯v + Ith (ER − 1)
(ER + 1)
(
I¯v + Ith
) (3.12)
Including this constraint in the least-squares solver, leads to the results
displayed in Fig. 3.9. The maximum value of |C(n) | has clearly been reduced
for all driver impedances. Interestingly, including the constraint degraded
the performance overall as not even a 6-tap FFE was able to bring the MSE
down as low as in Fig. 3.7. Above all, the lowest MSE is achieved while
keeping Rd below 50W, although Rd = 125Ω limits the penalty to 30%.
The plot also concludes that the largest gain in performance is achieved by
opting for an NFFE equal to 4 taps. In contrast, the MSE for Rd = 375Ω is
twice as large and shows no improvement above 2 taps.
As already described in Fig. 3.5, lowering Rd reduces the drive efficiency m
of the driver-VCSEL interface. This implies that the tap coefficients C(n)
need to be scaled accordingly in order to establish the same optical modula-
tion swing. Figure 3.10 plots the sum of the 1m -scaled coefficients |C(n) |,
which represents the total tail current of the equalizer. As can be observed,
a higher driver impedance is favored in terms of power consumption. For
example, a 4-tap FFE output stage combined with Rd = 50Ωwould consume
Co-optimization between the driver-VCSEL interface and on-chip FFE 45
1 2 3 4 5 6
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
NFFE
M
S
E
Rd = 20Ω Rd = 50Ω
Rd = 125Ω Rd = 375Ω
(a)
1 2 3 4 5 6
1
1.1
1.2
1.3
1.4
1.5
NFFE
M
ax
( |C(
n
)|)
Rd = 20Ω Rd = 50Ω
Rd = 125Ω Rd = 375Ω
(b)
Figure 3.9: (a) The performance degrades for higher output impedances when including the
constraint on the magnitude of the tap coefficients, because (b) the magnitude of the main
coefficient has been reduced automatically by the least-squares solver.
1 2 3 4 5 6
2
4
6
8
NFFE
∑∑ ∑ |C
(n
)|/
m
Rd = 20Ω Rd = 50Ω
Rd = 125Ω Rd = 375Ω
Figure 3.10: Taking into account the drive efficiency m, results in a larger tail current
required for low Rd to provide the modulation current.
less than a 1-tap FFE with Rd = 20Ω.3
In conclusion, the optimal trade-off for this example would probably coincide
with a 4-tap FFE combined with an Rd = 125Ω when slightly prioritizing
energy efficiency. Remark that there is one caveat associated with the
constraint on C(n) in Eq. (3.12). It only fixes the steady state modulation
levels to be equal to the specified Iv0 and Iv1. However, the instantaneous
amplitude for the isolated bits running at the maximum data rate can undergo
larger swings after equalization, illustrated in Fig. 3.8b. There is no guarantee
that the drive current will not drop below the threshold current and what the
3Of course, a 4-tap FFE driver would overall consume more than a 1-tap driver because
of the preceding amplifier and delay stages. In this example, only the power consumption of
the output stage is evaluated.
46 Chapter 3
response of the VCSEL will be for these isolated bits. Fortunately, part of
the equalization is used to compensate the bandwidth limitation imposed by
the electrical access network Hea ( f ) = iaid . Hence, the peak at the bottom
level of the signal will already be reduced before it arrives in the active
region, preventing that Ia < Ith. Nevertheless, verification with a non-linear
large-signal model of the VCSEL is a mandatory next step, after this system
level approach.
3.3 Verification of the equalized driver-VCSEL inter-
face
The aforementioned system analysis of the equalized driver-VCSEL interface
shows great potential in rapidly optimizing the power-performance trade-off.
However, it has one main drawback, namely that it is entirely based on
linearized models derived from small-signal experiments. It is therefore
essential to verify its outcome against large signal VCSEL modulation
experiments.
Section 4.5.4 and [45] investigate the impact of a 4-tap symbol-spaced FFE
on the BER, by consecutively enabling an extra tap coefficient for each
transmission experiment. In this section, a 0.13 µm SiGe BiCMOS PAM-4
driver IC modulates a 22GHz USC VCSEL at 28GBd. The operating
conditions are emulated in simulation by setting the following parameters:
I¯v = 6 mA, Rd = 160Ω, Ld = 300 pH and Cd = 260 fF. The constraint
Eq. (3.12) is included in the least-squares solver. This time the receiver
response is not ideal, but corresponds to the frequency response from the
DSC-R409 receiver shown in Fig. 2.12. The MSE, which can be observed
in Fig. 3.11a, experiences the biggest improvement when transitioning to a
3-tap FFE. Enabling the fourth tap can further bring the error down, but the
performance gain is less drastic. This is better illustrated by monitoring the
eye height shown in Fig. 3.11b. Both graphs conclude that a 2-tap FFE is
not effective at this data rate as the second tap is basically forced to zero by
the least-squares solver. More importantly, the trends observed in simulation
correspond to the BER experiments from Fig. 3.12. Due to the non-linear
modulation behavior of the VCSEL and the peaked response of the receiver,
the BER of the optical link is limited by ISI instead of receiver noise. Hence,
the effectiveness of the equalizer can be quantitatively verified by observing
BER curves for different FFE settings. Even though the experiment focused
on PAM-4 modulation, this makes no difference for the simulation results
as it is a linear model. In addition, the eye diagrams before and after 4-tap
Co-optimization between the driver-VCSEL interface and on-chip FFE 47
1 2 3 4
0.0078
0.0509
0.1817
NFFE
M
S
E
(a)
1 2 3 4
0.84
0.93
0.97
1
NFFE
Ey
eh
eig
ht
(b)
Figure 3.11: A 28GBd simulation of a modulated USC VCSEL cascaded with a modeled
DSC-R409 receiver reveals that most gain in (a) MSE and (b) eye height is achieved after
enabling 3 tap coefficients.
−7 −6 −5 −4 −3 −2 −1 0
10−1
10−2
10−3
10−4
10−5
10−6
10−7
Average input power (dBm)
BE
R
4 taps
3 taps
1,2 taps
FEC
Figure 3.12: BER plots vs. number of FFE taps for a PAM-4 modulated USC VCSEL at
28GBd [45].
equalization for the simulation and the experiment also match quite well in
terms of ringing and jitter, see Fig. 3.13. Note that the modulated VCSEL
appears to experience far less ringing on the bottom level compared to the
linear simulation model, an effect that can be allocated to the asymmetric
modulation characteristics of the VCSEL under test.
48 Chapter 3
-35 -17.5 0 17.5 35
0
0.5
1
Time (ps)
O
pt
ica
lo
ut
pu
tP
o
(a)
-35 -17.5 0 17.5 35
0
0.5
1
Time (ps)
O
pt
ica
lo
ut
pu
tP
o
(b)
(c) (d)
Figure 3.13: Respectively 1 and 4 FFE taps enabled at 28GBd for (a), (b) a simulation of
the driver-VCSEL interface and (c), (d) a VCSEL driver modulation experiment (60mV/div,
10 ps/div).
Co-optimization between the driver-VCSEL interface and on-chip FFE 49
3.4 Conclusion
The design of a VCSEL driver IC needs to take into account a lot of variables,
like driver impedance, interconnect inductance, laser bias current, etc. This
results in a large design space that needs to be explored and will especially
consume a lot of computing resources if executed at the transistor level. We
have introduced a linear model of the driver output stage that interfaces with
the small-signal lumped network of the VCSEL in an attempt to reduce
simulation time while providing reliable results. Transforming the frequency
response of the driver-VCSEL interface to the impulse response, allows
to verify meaningful time domain properties, such as the rise time and
settling time. It quickly gives an idea whether or not the system bandwidth
is sufficient for the specified data rate and if equalization will be necessary.
An FIR filter is easily included in the driver-VCSEL interface model and
introduces another degree of freedom for the design, i.e. the number of tap
coefficients and the time delay between them. In fact, a linearized model
of an optical link can be constructed that includes the impulse response
from the driver, the interface to the VCSEL and the receiver. Sweeping
the driver impedance in function of the number of the taps—while the
coefficients are determined by a least-squares solver that minimizes the ISI
in the received signal—visualizes which combination leads to the lowest
MSE. Including the drive efficiency m and a constraint on the magnitude of
the FFE coefficients into the verification process, allows to find the optimal
trade-off between power consumption and performance. Even though this
methodology is entirely based on linearized small signal models, it proves
to be a well-grounded efficient first step in the design flow of a VCSEL
driver, since similar trends were observed with actual BER experiments on a
VCSEL driver assembly.

4
PAM-4 VCSEL driver with a 4-tap
FFE output stage
Considering the standardization of 400Gigabit Ethernet, there is a strong
urge from the industry to target data rates of 50Gb/s and beyond, as discussed
in Chapter 1. Unfortunately, ISI-free NRZ modulation at 50Gb/s is not
feasible without strong equalization, because the bandwidth of the fastest
long-wavelength VCSEL, reported at 22GHz [32], is still below the Nyquist
bandwidth of 25GHz. This problem could be avoided by implementing a
modulation format with a higher spectral efficiency, such as multi-level pulse
amplitude modulation. For intensity-modulated/direct-detection optical
links, this is acquired by dividing the full range of the optical power into
M levels, where each level is equivalent to a symbol and each symbol is
represented by log2 M bits. The symbol rate is further referred to as the baud
rate and is a factor log2 M lower than the bit rate. Hence, a compromise
exists between a loss in signal-to-noise ratio (SNR) and a gain in spectral
efficiency, as can be observed in Table 4.1. Due to the limited optical
output power of a few mW and the low ER (≈ 5 dB) to preserve linearity
of a high-speed long-wavelength VCSEL [46,69], the sweet spot for such
a VCSEL link lies at M = 4. This is equivalent to PAM-4 modulation
and effectively doubles the spectral efficiency. However, not many PAM-4
driver ICs are available for >40Gb/s VCSEL links. Hence, in light of the
FP7 project MIRAGE, a dual-channel PAM-4 driver chip was designed to
test the feasibility and performance of 50-80Gb/s long-wavelength VCSEL
52 Chapter 4
links. This chapter starts with presenting and specifying the complete driver
architecture featuring a 4-tap FFE PAM-4 output stage. The design of
the FFE topology was mainly based upon simulations conducted on the
small-signal equivalent of the driver-VCSEL interface, which was already
discussed in the previous chapter. Problems of the first generation MIRAGE
driver—related to clock synchronization—were tackled by following a
comprehensive design methodology for retiming circuits. A reusable generic
current mirror was widely deployed inside the data path to save time and
will therefore be analyzed in detail to verify reliable operation. Finally, some
techniques to compensate the non-linear modulation behavior of VCSELs
are proposed. The chapter ends with electrical and optical link experiments
of the PAM-4 driver, evaluating the effectiveness of the FFE and comparing
the performance of PAM-4 with NRZ modulated VCSELs.
Table 4.1: Simplified analysis of a PAM-M modulated optical link—susceptible to additive
white Gaussian noise only—shows that PAM-4 has the smallest SNR penalty compared to
NRZ modulation. The noise power is inversely proportional to the spectral efficiency by
assuming a brick-wall filter at the receiver with a cut-off frequency that scales with the baud
rate.
M levels 4 8 16
M − 1 decision thresholds 3 7 15
Spectral efficiency log2 M (bits/Hz) 2 3 4
Signal power penalty (dB) 4.8 8.5 11.8
Noise power penalty (dB) -3 -4.8 -6
SNR penalty (dB) 1.8 3.7 5.8
4.1 Architecture
4.1.1 Specifications
Experience gained from experiments with the first generation driver1 helped
in determining the specifications for the second generation PAM-4 driver—
especially regarding the bias and modulation current ranges—targeting
ideally 80Gb/s, as listed in Table 4.2. The maximum drive current ranges
from 4-16mA to accommodate TUM VCSELs with a BTJ of 3-5 µm, which
1The first generation MIRAGE driver is described in an excerpt from the Photonics
Technology Letters paper as part of the ‘Experiments’ section of this chapter, see Section 4.5.1.
PAM-4 VCSEL driver with a 4-tap FFE output stage 53
are characterized by a roll-off current of 7mA and 18mA respectively. Alas,
the variety of VCSELs available in the project makes it difficult to tailor the
design of the driver for a certain drive current, thereby sacrificing high-speed
performance.
Table 4.2: Target specifications for the second generation MIRAGE VCSEL driver
Parameter Min Typ Max
Modulation PAM-4
Baud rate (GBd) 25 40
Rise time 10-90% (ps) 16.6
Average laser current (mA) 4 9 16
Threshold laser current (mA) 1 2
Threshold laser voltage (V) 0.9 1
Supply voltage driver (V) 2.375 2.5 2.625
Anode voltage (V) 3.1 4
VCSEL cathode pitch (µm) 350
Driver pad pitch (µm) 150 200
The VCSELs are diced into common-anode arrays with a cathode pitch of
350 µm and are available in 2-by-1 and 4-by-1 configurations. Since two
data inputs—least significant bit (LSB) and most significant bit (MSB)—are
required per channel to generate a PAM-4 signal, the smallest channel pitch
of the driver is mainly limited by the bond pad spacing of these data inputs.
The first version used differential signals (ground-signal-signal-ground) with
a pad pitch2 of 150 µm resulting in a channel pitch of 900 µm (comprising
the LSB and MSB inputs) [46]. The large discrepancy with the cathode pitch
would result in a long fan-out trace for a quad-channel driver. Hence, the
decision to go for a dual-channel driver was quickly made. For the second
version, the differential approach is traded in for a singe-ended scheme for a
few reasons. First of all, in-house test pattern generators were equipped with
only one differential output and required the insertion of a splitter to drive
the two differential inputs. Secondly, the circular fan-out of all high-speed
PCB connectors, ten in total, lead to long PCB traces accompanied by
high transmission losses. The single-ended topology eliminates the splitter
from the setup and allows reducing the fan-out radius since the number
2The pad pitch of 150 µm corresponds to 80 µm wide traces with a clearance of 70 µm.
These were also the minimum dimensions specified by the printed circuit board manufacturer.
54 Chapter 4
D Q D QD Q D Q D Q
D Q D QD Q D Q D Q
vdd1
160Ω
VCSEL
SFEP
4-tap output driver
chip boundary
vdd2
Figure 4.1: Block diagram of the data path for one channel of the dual-channel PAM-4
driver.
of connectors is halved. Unfortunately, single-ended transmission lines
typically need to be dimensioned wider to approximate a characteristic
impedance of 50W. This leads to a pad pitch of 200 µm and a channel pitch
of 800 µm, which is still not compatible with the cathode pitch. Therefore,
the second generation driver is also restricted to integration with 2-by-1
VCSEL arrays.
4.1.2 Channel architecture
The channel architecture of the high speed data path is shown in Fig. 4.1.
Two single-ended binary data streams—referred to as MSB and LSB—are
synchronized with each other through retiming data flip-flops (DFFs) before
they are combined at the output stage. Dimensioning the tap coefficients A0
to A3 in theMSB path twice as large as in the LSB path creates the multi-level
current. This current can be predistorted by changing the magnitude and the
sign of the coefficients A0 to A3. Each tap can be enabled and configured
on demand. This allows determining the optimal configuration based on
power efficiency and BER measurements. The delay between consecutive
FFE taps is derived from a cascade of DFFs and corresponds to one symbol
period. Since the time delay between the tap coefficients scales with the
data rate, the tap coefficients will need to be recalculated accordingly. The
equalization can therefore be effective for lower data rates as well. Moreover,
this topology consumes less power and area than a symbol-spaced delay
implemented with analog delay cells.
Setup and hold time violations are resolved by preceding the FFE chain
with an additional flip-flop and programmable delay cells. These delay cells
PAM-4 VCSEL driver with a 4-tap FFE output stage 55
also serve as a single-ended to differential converter with the dummy side
connected through 50W to the supply voltage nodeVdd1. The reference clock
CLK is converted and amplified to a differential signal to drive the MSB
and LSB flip-flops. Both clock and data signals propagate from bonding
pad to the input stage through 50W on-chip transmission lines to minimize
transmission losses and reflections. The motivation to dimension the back
termination resistor Rout at 160W is based on the equalization flow described
in Chapter 3. Typically, the falling edge of the VCSELs’ transient behavior
is slower than the rising edge due to the relaxation oscillation, leading to
pulse width distortion (PWD). This non-linear effect can close the bottom
eye in a PAM-4 eye diagram, effectively limiting the performance, but can
be compensated by the insertion of a selective falling-edge pre-emphasis
(SFEP) block in the channel. This falling-edge detection circuit senses
the instant when both data inputs MSB and LSB become logic low and
generates a falling pulse accordingly. The magnitude of the overshoot on
the rising edge also depends on the inductance present in the return-current
loop. By providing on-chip decoupling Ca for the anode voltage, the length
of the return-current path is made shorter which helps in decreasing the total
inductance.
56 Chapter 4
4.2 Retiming
4.2.1 Introducing the synchronization window
An important characteristic of a DFF is the metastable state that can occur
when the data signal transitions inside the forbidden setup and hold region,
i.e. when the data-to-clock delay (tdc)3 becomes too small or too big relative
to the bit period Tbit . In this metastable state, the output is not well defined
anymore and incorrect retiming of the input signal occurs. We can also
define a stable state, described by a fixed relation between the output and the
clock input, denoted by clock-to-output delay (tcq). This stable state can be
represented by a synchronization window in the time domain with period
Tsync, see Fig. 4.2a.
(a) (b)
40 60 80 100 120
−200
0
200
Time (ps)
Am
pi
tu
de
(m
V)
clock
data tdc = 16 ps
data tdc = 10 ps
data tdc = 4.5 ps
q1 t
re f
cq
q1 t
+5 %
cq
q1 t
+55 %
cq
(c)
Figure 4.2: (a) A flip-flop provides a stable output if the input signal transitions outside the
setup and hold regions. (b) Test bench for flip-flop cascade explaining the timing relations. (c)
The output signal q1 shows metastability when setup and hold violations occur, characterized
by a distorted waveform and increased tcq delay.
Tsync is defined in this work as the window where a 5% variation is tolerated
3The timing parameters, such as tdc , correspond to the delay between the active data
edge and the active clock edge on which the flip-flop/latch triggers. This can be the falling
edge or rising edge depending on the context.
PAM-4 VCSEL driver with a 4-tap FFE output stage 57
on tcq. As can be observed in Fig. 4.2c, restricting the variation to be less
than 5% preserves the shape of the waveform. The corresponding test bench
explaining the abbreviated timing relations between data, clock and output
is depicted in Fig. 4.2b. A fundamental condition on the sync window is
that Tsync must be larger than the combined peak-to-peak jitter on the input
data and clock signal. Enough margin must hereby be provided to sustain
variations on the clock amplitude and allow operation across process,voltage
and temperature (PVT) corners.
4.2.2 Troubleshooting the first generation retiming block
The first generation PAM-4 driver struggled to obtain baud rates higher than
32GBd. On rare occasions, most often after seemingly infinite fiddling
with external delay adjusters and clock amplifiers, baud rates up to 36GBd
could be acquired. This implied that the problematic behavior at high
speeds was related to the on-chip retiming stages experiencing metastable
states. Figure 4.3 depicts how this phenomenon translates to eye diagrams; it
either shows up as noise inside the eye or as shifted eyes in amulti-level signal.
(a) 28mV/div, 10 ps/div (b) 52mV/div, 10 ps/div
Figure 4.3: Closed or distorted eye diagram measured from probed output of first generation
driver at (a) 40Gb/s NRZ revealing metastable flip-flop operation and at (b) 25GBd PAM-4
showing bad synchronization between LSB and MSB.
In order to estimate the jitter on the input signal, an experiment was built that
emulates the source signal to the driver inputs. More specifically, the 4 cm
differential PCB trace was single-endedly excited at the positive input—while
the negative input was terminated—and probed differentially at the end of
the trace, see Fig. 4.4a. An eye diagram measured at 40Gb/s shows that
the peak-to-peak jitter approximately amounts to 8 ps while the full rise
time of the signal corresponds to 20 ps, see Fig. 4.4b. This information can
now be used in combination with Cadence Virtuoso transient simulations to
58 Chapter 4
(a) (b) 100mV/div, 10 ps/div
Figure 4.4: A single-ended input/differential output probing setup revealed that the jitter on
the 40Gb/s signal entering the PAM-4 driver chip amounts to 8 ps.
determine the remaining sync window after subtracting the input jitter value.
A worst-case environment is constructed by simulating at a substrate temper-
ature of 70 ◦C, including a supply voltage drop of 5%, providing a 40Gb/s
input signal with 20 ps rise time and operating in a 3-σ corner extracted from
tcq. The tcq specification is plotted in Fig. 4.5 versus tdc for various clock
amplitudes, allowing a clear visualization of the sync window of the flip-flop
circuit. A clock amplitude of 150mV results in a Tsync of 9.5 ps and drops
to 7 ps for a 50mV clock amplitude. Taking into account data and clock
jitter, the sync window vanishes and measurements similar to Fig. 4.3 can be
expected. Hence, the design of the retiming circuits has to be re-evaluated
and starts with the derivation of the circuit specifications in the next section.
PAM-4 VCSEL driver with a 4-tap FFE output stage 59
0 5 10 15 20 25
0
8
9
10
15
Data-to-clock delay tdc (ps)
Cl
oc
k-
to
-o
ut
pu
td
ela
y
t c
q
(p
s)
Clock amplitude 50mV
Clock amplitude 150mV
t+5 %cq sync limit:150mV clock
Figure 4.5: The clock amplitude has a strong effect on the sync window with Tsync varying
from 7 ps to 9.5 ps at 40Gb/s.
60 Chapter 4
4.2.3 Deriving flip-flop specifications
The DFF implemented in the PAM-4 driver IC consists of a master latch that
flips and a slave latch that flops, see Fig. 4.6a. Each latch has a differential
data input, a differential enable/clock input and a differential data output.
Hence, the timing relation between these three signals must be specified:
tcx , tdc, tdx; where tdx refers to the data-to-output delay of the master
latch. Another requirement can be imposed on the amplitude of the signals;
necessary for designing the source and load circuits.
(a)
(b)
(c)
Figure 4.6: (a): DFF is composed of a master and a slave latch. (b) Case where td0 < tc0.
(c) Case where td0 = tc0 +
Tbit
4 .
Two cases can be considered depending on the position of the data transition
relative to the clock signal. A first case, drawn in Fig. 4.6b, assumes that the
data is already settled before the rising edge of the clock occurs, i.e. before
the latch becomes transparent. This means that the output of the master
latch solely depends on the clock-to-output delay tcx . Correct retiming is
PAM-4 VCSEL driver with a 4-tap FFE output stage 61
Table 4.3: Specifications for a 40Gb/s DFF with k = 0.5.
Parameter Min Typ Max
tcq (ps) 12.5
tdx (ps) 7.5
Data rate (Gb/s) 40
datadi f fpp (mV) 300 400
clockdi f fpp (mV) 200 300
qdi f fpp (mV) 300 400
accomplished if the output of the subsequent latches is also determined by
the clock-to-output delay and happens if tcx, tcq < Tbit2 .
The second case verifies the speed of the master latch in transparent mode,
shown in Fig. 4.6c. Full transparency takes place at the positive peak of
the clock signal and thus requires a quarter-bit period delay between clock
and data. The output of the master latch will experience a delay tdx and
is typically smaller than tcx . If tdx < Tbit4 , then the output of the slave is
defined by tcq and the data is already synchronized with the clock, similar
to case one. On the other hand, if tdx > Tbit4 , then the slave output is
defined by txq and is still data-dependent. A third cascaded latch is therefore
necessary for clock synchronization and will require that tdx + txq < 3Tbit4 .
The master output x does not transition at the instant where the slave is fully
transparent, therefore txq > tdx . Conveniently, a speed reduction factor
k can be introduced where txq = (1 + k) · tdx . Hence, a specification for
the data-to-output delay of the master latch can be derived: tdx < 3Tbit4·(2+k) .
The timing analysis ends with finding an empirical value for k. Logically
speaking, the smaller k is chosen, the faster the latch needs to operate and
the more power it will consume. Experience shows that choosing k = 0.5
provides a reasonable trade-off between rise time and power consumption.
Table 4.3 gives an overview of the specifications used for the design of the
40Gb/s DFF. Previous analysis shows that cascading multiple latches can
be necessary to achieve a stable state given a certain tdc. The next section
will therefore study the impact on the sync window when cascading multiple
latches, allowing us to determine how many latches are optimally cascaded.
62 Chapter 4
0 5 10 15 20 25
0
7
8
9
10
15
Data-to-clock delay tdc (ps)
Cl
oc
k-
to
-o
ut
pu
td
ela
y
t c
q
(p
s)
2 latches
3 latches
4 latches
5 latches
t+5 %
cq1 sync limit
Figure 4.7: Cascading multiple latches can significantly increase the sync window compared
to the standard dual-latch flip-flop.
4.2.4 Cascading multiple latches
It is clear by now that the first generation retiming topology from Sec-
tion 4.2.2 is inadequate for operation at 40GBd. One way to enlarge Tsync is
by improving the setup and hold time of a flip-flop. This can be achieved by
making the design more broadband, but in the end, only a few picoseconds
extra can be gained for Tsync. A more rigid method is to cascade multiple
latches and observe the output of the last latch. The preceding latches act as
a retiming buffer in this configuration. This is in contrast with the original
PAM-4 driver chip where the output of the first flip-flop q1 was directly
connected to the first tap of the equalizer. The sync windows derived after
evaluating the output of the internal latches from the test bench in Fig. 4.2b
are plotted in Fig. 4.7. It can be observed that tcq is significantly less
influenced by tdc if retiming buffers are added in addition to the standard
master-slave flip-flop, consisting of only two latches. Figure 4.8 reveals
that Tsync can be roughly doubled by cascading four latches compared to
the original dual-latch configuration, whereas the marginal improvement of
cascading a fifth latch is not worth the added complexity. The differential
clock amplitude is preferably between 150mV and 200mV, albeit a 2 ps
penalty on Tsync is still acceptable when the amplitude degrades to 100mV.
The enhanced sync window, accomplished by the additional flip-flop buffer,
PAM-4 VCSEL driver with a 4-tap FFE output stage 63
50 100 150 200
6.5
9.5
14.5
16.5
18
19
Clock amplitude (mV)
T
s
y
n
c
(p
s)
2 latches
3 latches
4 latches
5 latches
Figure 4.8: Tsync can be roughly doubled by cascading four latches compared to the original
dual-latch configuration, while the optimal performance occurs at a clock amplitude of
150mV.
allows relaxing the specifications for each flip-flop. This can significantly
save power consumption as a total of ten flip-flops are implemented in the
final PAM-4 driver from Fig. 4.1. Although the power consumption per
flip-flop is approximately the same between the two design iterations, the
second version is able to deliver 30% more output swing. Each flip-flop
in the cascade from Fig. 4.1 is biased with the same tail current. However,
further optimization is still possible by biasing e.g. the first flip-flop and the
consecutive flip-flops with different tail currents, as demonstrated by a 17%
improvement in energy efficiency [76].
The final circuit for each flip-flop is shown in Figure 4.9 and comprises a
master and a slave CML latch, being a modified version from [74]. The flip
flops that interface with the predriver are equipped with an emitter-follower
output stage to shield the predriver input from the next flip-flop input. Power
can be saved by halving the current through the buffer stage Ib if it is located
in the LSB data path. Current matching and output impedance of the Ilatch
current sources are improved by degenerating with a 160mV voltage drop
while restricting the required area by opting for diffusion resistors. The
total area of the flip-flop is further minimized by delivering Ib and the base
voltage Vb of cascode transistor Q4 from a generic PAM-4 current mirror
circuit. This biasing technique is applied throughout the retiming and driver
stages, allowing close cascading of all stages, and will be elaborated in a
64 Chapter 4
Figure 4.9: Circuit schematic of the final CML master-slave flip-flop with a buffered output
to shield the predriver input from the next flip-flop input.
next section.
The worst-case characteristics of the flip-flop are verified at a substrate
temperature of 70 ◦C, including a 5% supply voltage drop, adding post-
layout parasitics, loading the flip-flop with a NOR-gate and a predriver
and applying a 300mVpp differential input signal with a rise time of 20 ps
and a 200mVpp differential clock signal. The achieved characteristics
summarized in Fig. 4.9 all comply with the derived specifications at 40Gb/s
from Table 4.3. In the end, a worst-case synchronization window of 19 ps is
obtained for a cascade of two flip-flops (four latches), which is over 75% of
the bit period at 40Gb/s and corresponds to a twofold increase compared to
the original dual-latch configuration.
4.2.5 Avoiding the setup and hold region with delay stages
Despite the fact that the synchronization window of the retiming chain is
maximized, the input signal can still transition inside the forbidden setup
and hold region. An on-chip clock and data recovery (CDR) circuit could
resolve this issue by ensuring a fixed time relation between the input data
and the clock signal. Unfortunately, this is a topic that falls beyond the
scope of this thesis. An alternative is to manually manipulate the timing
relation by inserting programmable delay stages in front of the flip-flops.
The required time delay depends on the setup and hold time region tsu + th
as well as on the peak-to-peak jitter of the input signal. Exciting the flip-
flop from Fig. 4.9 with a jitter-free input signal imposes a lower bound
on the time delay, that amounts to almost 6 ps. The upper bound is sim-
ply the summation of the peak-to-peak input jitter and the setup and hold
time region and corresponds to roughly 16 ps assumingworst case conditions.
PAM-4 VCSEL driver with a 4-tap FFE output stage 65
Figure 4.10: Circuit schematic of the delay cell consisting of a slow and a fast path
accommodated by enabling current sources Is1−s4 or I f 1− f 2 respectively.
An ideal time delay td is represented in the Laplace domain as H (s) = e−std
and introduces a phase shift equal to −ωtd [77]. The exponential transfer
function converted to a rational function requires an infinite pole-zero rep-
resentation through a Taylor series expansion and is therefore not feasible
to implement. The best possible approximation to the all-pass transfer
function of the ideal time delay can be calculated with an n-th order Padé
approximate, where n ≤ 2 is typically used for active delay cell circuits.
For example, the first-order approximation is given by H (s) ≈ 1−s
td
2
1+s td2
. The
frequency response of the associated group delay equals td1+(ω ·td )2 and is
characterized by a delay-bandwidth product equal to 0.106 when considering
10% dispersion on the group delay [78–80]. Considering a quasi-flat group
delay up to the Nyquist bandwidth of 20GHz would result in a delay of
td = 5.3 ps. Cascading delay stages or opting for a second-order approximate
will therefore be necessary to acquire time delays up to 16 ps.
As the absolute tunable range of td amounts to 10 ps and is quite large, being
almost 200% of the minimum time delay, a different approach was pursued
that is based on relative time delays in parallel circuits. The delay circuit
from Fig. 4.10 is comprised of a slow and a fast path by enabling current
sources Is1−s4 or I f 1− f 2 respectively. The slow path consists of two cascoded
CML stages buffered intermediately with an under-biased emitter follower
stage whereas the fast path consists of only one cascoded CML stage. Both
outputs are combined into the low-impedance emitter of cascode transistors
Q5. The time delay td therefore is equal to the difference in absolute delays
from the fast and the slow path and amounts to around 10.8 ps. The reason
why it is bigger than the specified lower bound is because the circuit is a
design re-use from the original PAM-4 driver.
The number of delay stages in the cascade can now be derived based on
66 Chapter 4
Figure 4.11: Setup and hold region diagram for a bit period of 25 ps reveals that a cascade
of three delay stages is necessary to shift a jittery data block back into the synchronization
window.
on the knowledge of td = 10.8 ps, th = 1.6 ps, tsu = 4.2 ps, tppjitter = 10 ps
and Tbit = 25 ps. A worst-case scenario occurs when the jittery data signal
intersects with the boundary of the hold region, as visualized in Fig. 4.11.
The timing diagram clearly shows that three delay periods td are needed
to shift the data block back into the synchronization window. Therefore, a
cascade of three delay stages will precede the retiming chain from Fig. 4.1
to avoid setup and hold violations under all circumstances.
Circuit dimensions were derived from large signal simulations while signal
integrity was sacrificed for lower power operation. Hence, flat-group delay
characteristics are not obtained as originally stated to approximate an ideal
time delay. Figure 4.12c reveals that the slow path has an insufficient
10% group delay bandwidth of 14GHz, while for the fast path, it is above
20GHz. The total delay of the cascade can also be derived from this plot
and amounts to 33 ps. Although the delay stages are fully switching, the
aggregate bandwidth of the cascade is not sufficient for 40Gb/s operation,
showing up as ISI and heavy peak-to-peak jitter, as depicted in the eye
diagrams from Fig. 4.12a and Fig. 4.12b. In order to fully exploit the sync
window, a preamplifier for the flip-flop will be necessary to clean up the
signal.
PAM-4 VCSEL driver with a 4-tap FFE output stage 67
-25 -12.5 0 12.5 25
-0.15
0
0.15
Time (ps)
Vo
lta
ge
(V
)
(a)
-25 -12.5 0 12.5 25
-0.15
0
0.15
Time (ps)
Vo
lta
ge
(V
)
(b)
108 109 1010 1011
20
40
60
Frequency (Hz)
G
ro
up
de
la
y
(p
s)
Slow triple delay path
Fast triple delay path
(c)
Figure 4.12: 40Gb/s simulation of a triple delay stage cascade while observing the output
of the slow path (a) and the fast path (b).(c) Group delay difference between the two paths
amounts to roughly 33 ps.
68 Chapter 4
4.2.6 Cleaning up the delayed signal with a continuous-time
linear equalizer (CTLE) stage
The signal integrity issues shown in Fig. 4.12 can be resolved by buffering the
delayed output with a CTLE, such as a capacitively degenerated differential
amplifier as shown in Fig. 4.13a. It has been shown to be capable of altering
the group delay characteristics [42], as well as boosting the bandwidth by
creating a peaked frequency response [81]. Two circuit parameters Ce and
Ib are under inspection to accommodate frequency response tuning, while
Rc and Re are fixed to maintain an ideal closed-loop differential gain of
two. The outcome on the magnitude and the group delay is plotted in
Fig. 4.13. Ce determines together with Re the location of the zero and
therefore is able to shift the resonance frequency as well as enlarge the
group delay variation relative to 0 ps. In contrast, tail current Ib only
affects the negative portion of the group delay while it can also increase
the gain and the peaking behavior at more or less a fixed resonance frequency.
The eye diagram from Fig. 4.12a suggests that the high-frequency transitions
occur sooner in time compared to the transitions with lower frequency
content. Inverting this behavior corresponds to a group delay response that
transforms from negative to positive value as frequency rises. This matches
very well with the CTLE response from Fig. 4.13b when adapting Ce or Ib.
Since a current source is far more flexible to tune than a capacitor, Ce is
fixed and Ib varies from 400-600 µA in the final circuit. The equalization
performance is confirmed by observing the eye diagrams for both the slow
and the fast delay path in Fig. 4.14. Significantly less ISI is noticeable,
especially for the slow path, which will certainly be beneficial for retiming
at 40Gb/s.
PAM-4 VCSEL driver with a 4-tap FFE output stage 69
(a)
108 109 1010 1011
−10
−5
0
5
10
Frequency (Hz)
G
ro
up
de
la
y
(p
s)
(b)
108 109 1010 1011
−5
0
5
Frequency (Hz)
M
ag
ni
tu
de
(d
B)
Ib = 400 µA,Ce = 80 fF
Ib = 400 µA,Ce = 120 fF
Ib = 600 µA,Ce = 120 fF
(c)
Figure 4.13: (a) A CTLE stage that can modify (b) the group delay properties and (c)
increase the bandwidth or the gain by adapting Ce and/or Ib .
-25 -12.5 0 12.5 25
-0.15
0
0.15
Time (ps)
Vo
lta
ge
(V
)
(a)
-25 -12.5 0 12.5 25
-0.15
0
0.15
Time (ps)
Vo
lta
ge
(V
)
(b)
Figure 4.14: 40Gb/s simulation of a CTLE stage acting on the output of a triple delay cell
cascade to reduce ISI with Ib = 400 µA: (a) slow path, (b) fast path.
70 Chapter 4
4.2.7 Buffering the clock signal
As noticed in Fig. 4.8, the clock amplitude has a large impact on the sync
window Tsync. Indeed, a minimum differential clock amplitude of 150mV
or 300mVpp is necessary for optimal retiming. This should be sustained
regardless of the number of flip-flops enabled, which is proportional to the
number of FFE taps and ranges between two and five flip-flops. The clock
amplifier cascade, drawn in Fig. 4.15, is made symmetrical for both channels
and consists of four amplifiers per channel. The channels are isolated
from each other through the input stage A1. As the clock signal CLK
essentially splits up in half, the single-ended 50W interface is modified to a
100W interface. This is realized by inserting after the split a custom 100W
transmission line on metal layer 4 with metal layer 2 serving as ground plane
and is resistively terminated at the end with its matched impedance of 100W.
Amplifier A1 comprises a common-emitter differential pair connected to
emitter followers and acts as a single-ended to differential converter. The
large capacitive load of the MSB and LSB flip-flops is buffered with a
parallel combination of amplifier A3. High-frequency peaking is obtained by
implementing capacitive emitter degeneration combined with heavily biased
emitter followers. Finally, A2 is an intermediate stage that is equipped with
the same bandwidth enhancement techniques as in A3.
+
-
+
-
+
-
+
-
Figure 4.15: The clock buffer chain, which is identical for both channels, converts a
single-ended clock signal into a differential signal capable of driving up to 5 flip-flops (FFs).
The performance of the cascade is analyzed through small-signal and large
signal excitation, see Fig. 4.16. Worst-case results are predicted by including
post-layout views of the flip-flops and the clock buffers while operating in an
extracted 3-σ corner related to the gain at a substrate temperature of 70 ◦C.
Interstage parasitics are taking into account by adding 5 fF of capacitance to
the output nodes. AC-analysis reveals that the gain peak occurs at approx-
imately 32GHz afterwards dropping quickly with a 60 dB/octave roll-off.
Above all, the capacitive loading of 2 versus 5 flip-flops introduces a 5 dB
penalty at 40GHz. This can become problematic under large-signal excita-
PAM-4 VCSEL driver with a 4-tap FFE output stage 71
tion as the output of the last buffer A3 can drop below the required 300mVpp
amplitude. Increasing the CLK amplitude at the input stage A1 does not
solve this issue as the second stage already operates close to its limiting region.
109 1010 1011
0
20
30.7
35.7
40
45
Frequency (Hz)
M
ag
ni
tu
de
(d
B)
2 FFs
5 FFs
(a)
30 60 90
150
200
250
300
350
400
450
500
Input amplitude A1 (mVpp)
O
ut
pu
ta
m
pl
itu
de
(m
Vp
p)
Buffer A1 Buffer A2
Buffer A3 2 FFs Buffer A3 5 FFs
(b)
Figure 4.16: (a) The small-signal gain of the clock buffer cascade experiences a 5 dB drop at
the gain peak and at 40GHz when driving all flip-flops (b) Post layout 3-σ corner simulation
at 40GHz evaluating the clock amplitude after each buffer
It can be concluded from the previous analysis that the drive strength of
the clock buffers could be a potential bottleneck to achieve 40GBd with
a multi-tap FFE, however, baud rates close to 32GBd should be certainly
feasible.
4.3 4-tap PAM-4 driver architecture
4.3.1 Output driver
The output driver is split up in an MSB and LSB data path to create a PAM-4
output current, see Fig. 4.1 for the structure of the data path. The multi-level
output current can be predistorted by a 4-tap FFE filter acting on both data
paths, in order to compensate the modulation behavior of the VCSEL. More
specifically, tap A0 acts as the main cursor and taps A1 to A3 can be adapted
as post-cursors. Another configuration can consist of disabling A0 and using
A1 to A3 as precursor, main cursor and post cursor. The time delay between
consecutive taps is derived from the flip-flop cascade and corresponds to
the bit period. The cursors A1-A3 are implemented as a Gilbert cell of
which the bottom differential pair is excited by the data signal Vi and the top
differential pairs with bias voltages Vsp and Vsm, see Fig. 4.17. These bias
72 Chapter 4
voltages connect Q1 or Q2 to either 2V or to ground, enforcing the cascode
transistors into conduction or cut-off. A programmable inversion for the
output current Io is hereby created. The circuit for the main cursor A0 is a
simplified version of the Gilbert cell, where the two upper differential pairs
are replaced by a single cascode pair biased at a fixed voltage Vs. This bias
voltage is derived from a resistive divider and decoupled with Cs for lower
AC impedance. The magnitude of each tap coefficient is determined by the
tail current Ib and is being delivered to Q3 from a generic PAM-4 current
mirror circuit that simultaneously biases the MSB and LSB cursors. This
bias configuration allows placing the tap drivers as close as possible to each
other.
Figure 4.17: Generic circuit schematic from a cursor A1-A3 in the PAM-4 output driver.
Table 4.4 lists the tail current ranges for each tap coefficient respectively.
The transistors were dimensioned at a current density4 jcm being 80-90%
of the current density corresponding to the maximum fT , depending on the
available voltage headroom. This is primarily to avoid the transistor entering
the high-current level injection region over corners, as fT rapidly decreases
beyond this point. Each output stage is resistively degenerated with a voltage
drop of 130mV. This has the advantage that the base-emitter capacitance
of Q0 is bootstrapped, making it far less dependent on the tail current. A
4Current density jcm is defined as the common-mode current I flowing through a
unit emitter area Aeu and is calculated by dividing the current through the total number
of unit emitter areas that characterize the bipolar transistor. The amount of unit emitter
areas is calculated by dividing the total emitter area Ae with the minimal emitter length
Lem of 0.6 µm. Total emitter area Ae is found after multiplying the number of parallel
transistors m with the number of emitters Ne and the emitter length Le. To summarize:
jcm = I#Aeu =
I
m·Ne ·Le/Lem .
PAM-4 VCSEL driver with a 4-tap FFE output stage 73
Table 4.4: Output bias and tap driver tail currents of the PAM-4 VCSEL driver.
Driver MSB LSB
IA0 3.6-12.6mA 1.8-6.3mA
IA1 0-6mA 0-3mA
IA2 0-6mA 0-3mA
IA3 0-3mA 0-1.5mA
Ivcb 0-8mA 0-8mA
rather constant frequency response can thus be obtained over a range of tap
coefficients, simplifying the design of the preceding stage.
The input capacitance of the output driver is too large to be directly driven
by a flip-flop. A predriver, comprising a degenerated cascoded differential
pair combined with an emitter follower, provides a scaled-down load for
the flip-flops. In addition, the predrive voltage dynamically follows the tap
coefficient value to avoid over driving or under driving the output stage.
The bias current for the VCSEL is generated with the same generic PAM-4
current mirror as the driver stages. The output impedance is boosted by
adding an extra cascode transistor, of which the base is biased with two
diode-connected bipolar transistors, hence leading to a double-cascoded
current source. Output capacitance is minimized by sizing the top cascode
transistor at a jcm of 1000 µA.
The 4-tap FFE MSB and LSB driver architecture is realized in layout by
placing each tap driver side by side and combining the output currents iop and
iom in a bus routing structure. As can be observed from the channel layout
in Fig. 4.18, the total length of this bus depends on the area occupied by the
driver taps. By moving the tail current sources outside of the data path, it
becomes possible to limit this length to the absolute minimum of 260 µm. A
similar methodology was pursued for the predriver and the flip-flop cascade,
inflicting stringent area specifications for these circuits. Since there is no
such thing as a free lunch, buses (up to 40 µm wide) for routing the tail
currents will cross the high-speed differential traces creating additional
parasitic capacitance. More specifically, the bus routed on metal 2 and 4
between the flip-flops and the predrivers increases the parasitic capacitance
of the differential trace on metal 3 from 5.5 fF to 8.4 fF. The 2.9 fF of extra
load capacitance is negligible compared to the input capacitance from the
74 Chapter 4
predriver or driver stage, minimizing the penalty introduced by following
this layout methodology.
The single-ended input data (MSB, LSB) propagate to the delay cells through
50W transmission lines. The single-ended transmission line for the clock
input (CLK) is located at the symmetry line of both channels and splits up in
a 100W custom transmission line to reach the clock buffers. The ESD-cells
are placed in the middle of the input transmission lines as an attempt to
reduce the capacitive loading on the bond wire. The channel width amounts
to 1100 µm and is mostly dictated by the large number of biasing blocks.
The complete IC design is limited to two channels since the 300 µm pitch
of the VCSEL array is much smaller and would otherwise result in a long
tapered output trace. One of the main biasing blocks in the channel is a
generic current mirror that is used to deliver the tail currents and emitter
follower currents to the FFE stages. The broad range of operating conditions
challenged the design of the current mirror circuit and required careful
verification. It is therefore described in more detail in the next section.
PAM-4 VCSEL driver with a 4-tap FFE output stage 75
Figure 4.18: Layout of the 4-tap PAM-4 driver channel starting from the input transmission
lines for MSB, LSB and CLK located on the right side until the output currents iom and iop
located on the left side.
76 Chapter 4
4.3.2 One mirror to rule them all
As denoted in previous sections, the tail current mirrors of the gain stages
are placed outside of the data path. The parasitics resulting from the long
interconnection are isolated from the gain stage with a cascoded bipolar
junction transistor (BJT). Hence, the current mirror needs to generate a
bias voltage for the base of the cascode. In addition, it needs to provide an
accurate current ratio of two between the MSB and LSB stages to obtain
equally spaced PAM-4 levels. And last but not least, due to the high number
of gain stages requiring a tail current and emitter follower bias current (in
total 48!), the PAM-4 current mirror needs to be generic to comply with all
bias scenarios.
The final topology consists of an NMOS-transistor mirror sinking current
from the MSB and LSB cascode transistors Q0 and Q1, see Fig. 4.19.
The mirror is made generic by offering 18 outputs that can be combined
in any way possible; a feature that is best realized with MOS transistors
because of the near infinite gate impedance.5 The maximum MSB and LSB
current is created by combining twelve outputs for MSB and six outputs
for LSB, resulting in 12.6mA and 6.3mA respectively. In order to allow
sufficient headroom (>100mV) across the cascode transistors under worst-
case conditions, Vdsi needs to remain quite low (>200mV). As this is close
to the drain-source saturation voltage of the NMOS transistor—even when
dimensioned in weak inversion—it will lower the output impedance and
make the output current susceptible to the drain-source voltage. Systematic
mismatch between input and output current is therefore minimized by
enforcing the drain-source voltages (Vdso,Vdsi) equal to each other in a
negative feedback loop comprising an operational amplifier. The amplifier
A0 is a single stage PMOS operational transconductance amplifier (OTA) of
which the output is level shifted to the top rail voltage and buffered with a
class A bipolar stage to source the base current for Q0 and Q1.
DC-analysis A plurality of MOS transistors is present in the current
mirror to stabilize the biasing voltages across the input current range (50 µA-
1.05mA) and temperature range (27-105 ◦C) while providing accurate match-
ing as well. First of all, the design rule manual (DRM) was consulted to find
the MOS transistors with the best inherent matching properties to serve as the
5Biasing a BJT transistor with a fixedVbe while leaving the collector floating, is equivalent
to a forward-biased base-emitter diode. The base current of each floating transistor would
load the input current circuit in an unpredictable way, a scenario that needs to be avoided if
possible.
PAM-4 VCSEL driver with a 4-tap FFE output stage 77
+-
Figure 4.19: Generic current mirror that operates over a large range of input currents Ii
and provides an adaptable mirror ratio from 1 to 18 by connecting the emitter of a cascode
transistor to the desired number of MOS transistors. It can also be used to generate an
accurate output current ratio (e.g. two for PAM-4) and thus sink current from an MSB and
LSB stage.
mirror transistors M0-M18. These happened to be low-leakage devices with
a maximum drain-source voltage of 1.2V. Since the power supply voltage
of the driver is fixed at 2.5V, the current mirror circuit must be designed
as such that M0-M18 never go into breakdown. Secondly, a combination of
low Vt and normal Vt transistors are used in a beta helper configuration, in
order to achieve an almost temperature invariant voltage across the drain
of M0. This is accomplished by sizing M20 in strong inversion and M19
in moderate to strong inversion. Remark that the temperature dependence
of the effective gate-source voltage VEFF = Vgs − Vt is expressed using a
temperature exponent (BEX) that is related to the carrier mobility [82]. A
transistor operated at stronger inversion levels will therefore experience a
larger absolute increase in VEFF . This is in contrast with the threshold volt-
age Vt that is independent of the inversion level and has a linear temperature
coefficient of approximately −1mV/◦C, as experimentally verified.
Vdsi = Vgs0 + Vgs20 − Vgs19 (4.1)
VIi = Vgs0 + Vgs20 < 1.2 V (4.2)
Vdsor = Vdso + IQ0 · Rmetal0 + Vbe0 − Vbe1 − IQ1 · Rmetal1 (4.3)
≈ Vdsojcm0=jcm1,Rmetal1≈2·Rmetal0 (4.4)
78 Chapter 4
Vc0 > Vmaxdso + V
min
ce0 > 500 mV (4.5)
Since only M18 resides inside the feedback loop, care should be taken in
routing the other NMOS-outputs if used to bias a PAM-4 stage. More
specifically, the IR-drop should be matched in both cases as described in
Eq. (4.3).
Small-signal analysis Since the output current can change from 50 µA
to 12.6mA, care should be taken to stabilize the feedback loop under all
conditions. Furthermore, to improve Vds matching, the properties of the
closed-loop transfer function H = vovi at DC need to be derived. This will be
carried out by the general feedback theorem (GFT) that decomposes H in
less complex second level transfer functions, like the loop gain T [83]. More
specifically, the 1-GFT will be used for hand calculations, while the 2-GFT
is used in simulation.
The simplified small-signal model of the generic current mirror is drawn
in Fig. 4.20. A voltage source vi excites the drain of M0 to emulate Vdsi
variations while output voltage vo is related to variations in Vdso. Due to the
beta helper configuration, vi propagates to the gate ofM18. Themultiplication
factor m represents the summation of multiple transistor output currents, e.g.
Io18−o7 connected to the emitter of Q0. The collector of Q0 is connected to
the emitter(s) of a buffer (or a differential pair) which is equivalent to a low
impedance to ground. The low impedance is assumed negligible and the
collector is directly connected to ground. The operational amplifier A0 is
assumed to have infinite input impedance and zero output impedance and
has internal dominant pole compensation to fix the gain-bandwidth product
ω0dB. Its frequency response A(s) is described by Eq. (4.6).
A(s) =
A(0)
1 + sωp1
(4.6)
ω0dB = A(0) · ωp1 = gmACcA (4.7)
The closed-loop gain H should ideally be unity in order to provide perfect
Vds matching, as described by H∞ the ideal closed-loop gain. However,
finite loop gain and direct forward transmission through the feedback path
will let H deviate from H∞, determined by discrepancy factors D and Dn.
The null loop gain Tn determines Dn and is found by injecting both vz and
PAM-4 VCSEL driver with a 4-tap FFE output stage 79
Figure 4.20: Small signal model of generic current mirror with the 1-GFT probe inserted at
the input terminals of the operational amplifier A0 to verify stability and closed-loop gain vovi .
vi to null vo (null double injection). We are only interested in the null loop
gain at DC (Tn(0)).
H = H∞DDn = H∞
1 + 1Tn
1 + 1T
(4.8)
H∞ =
vo
vi
vy=0 = 1 (4.9)
Tn(0) =
vy
vx
vo=0 = A(0) 1 + gm0rpi0m · gm18rpi0 ≈ A(0) gm0m · gm18 (4.10)
The loop gain T relates to D and is found by single injection of vz while
setting vi equal to zero. Since stability of the loop is another concern,T needs
to be evaluated at AC as well, in contrast to Tn and H . In a first stadium, all
AC-impedances are omitted to calculate Tn(0) easily. Afterwards, capacitors
Cd18 and Cpi0 are introduced by applying the extra element theorem (EET)
consecutively to obtain T in low-entropy pole-zero form [84].
T =
vy
vx
vi=0 = T0 (1 +
s
ωz1
)
(1 + sωp1 )(1 +
s
ωp2
)
(4.11)
T0 =
A(0)
1 + 1(ro18/m)//ro0)gm0
≈ A(0) (4.12)
ωp2 =
gm0
m · Cd18 + Cpi0 (4.13)
ωz1 =
gm0
m · Cpi0 > ωp2 (4.14)
(4.15)
80 Chapter 4
Finally, the closed loop gain H at DC can be found after substituting the
expressions for Tn0 and T0.
H (0) =
1 + A(0) gm0m·gm18
(1 + A(0)) gm0m·gm18
(4.16)
Interestingly, if the transconductance ofQ0 andm ·M18 could be dimensioned
identical, H would become unity. However, since M18 operates in moderate
inversion, m · gm18 will always be smaller than gm0. For this application, a
steady state error of 1% assuming a step response is sufficient to minimize
systematic Vds mismatch. This translates to a gain of 49, which can be easily
accomplished with a single-stage OTA. Analyzing the non-dominant pole
ωp2 reveals that it falls to lower frequencies as current drops. Filling in
the operating point parameters for Ii = 50 µA suggests that fp2 ≈ 500 MHz.
Hence, gmA and CcA should be designed accordingly to satisfy ω0dB <
ωp2
3 for sufficient phase margin and gain A(0), see Fig. 4.21 for a visual
representation of the GFT decompositions.
Figure 4.21: GFT decomposition of closed-loop voltage gain for Vds matching with
AC-response of loop gain T .
Both transfer functions T and H are simulated in Cadence Virtuoso for the
final design under realistic load conditions, i.e. an output current from 50 µA
to 12.6mA is delivered to an output driver similar to Fig. 4.17. The input
current Ii and/or the number of parallel NMOS-transistors m is modified
to accommodate the wide current range. Simulation results plotted in
PAM-4 VCSEL driver with a 4-tap FFE output stage 81
103 104 105 106 107 108 109 1010 1011
−40
−20
0
20
40
Frequency (Hz)
M
ag
ni
tu
de
(d
B)
T : IQ0 = 50 µA
T : IQ0 = 12 · 1.05 mA
H : IQ0 = 50 µA
H : IQ0 = 12 · 1.05 mA
Figure 4.22: Simulation of H and T for two extreme MSB output current scenarios show a
stable closed-loop response close to unity.
Fig. 4.22 reveal a second order zero occurring in T around 500MHz. This is
introduced by the finite impedance of the opamp which was assumed infinite.
Since the second order zero lies far enough above the gain-bandwidth of
10MHz, it poses no threat on the stability as the phase margin remains
higher than 80°. Although the estimated T deviates from the simulated one,
both conclude that the system is stable. The closed loop gain H at DC
amounts to −0.0905 dB worst case, thereby satisfying the 1% steady state
error specification on the systematic Vds mismatch.
Impact of trace inductance As noted earlier, the generic current mirror is
placed outside of the data path and sinks current from the cascode transistors
Q0, Q1 through a long interconnection that can reach up to 700 µm. The
bias voltage for the base of the cascode transistors Vb also needs to cover
this distance. Therefore, stability of the feedback loop is verified again after
insertion of a 1 nH inductance for the Vb and Io trace. Figure 4.23 ensures
that the impact on the closed-loop gain H is limited to frequencies higher
than 1GHz where some resonances start to appear. No amplitude peaking is
present, indicating that stability is not inflicted.
82 Chapter 4
103 104 105 106 107 108 109 1010 1011
−30
−20
−10
0
Frequency (Hz)
|H
|(
dB
)
IQ0 = 50 µA
IQ0 = 12 · 50 µA
IQ0 = 1.05 mA
IQ0 = 12 · 1.05 mA
Figure 4.23: No harmful impact on H is observed when a 1 nH trace inductance is inserted
in the feedback loop for several output current configurations.
4.3.3 High-frequency modeling of current-routing bus
The previous analysis of the current mirror already revealed that an inductor-
equivalent model of the current routing trace did not harm the stability of the
feedback loop. However, a more complex modeling approach is necessary
to investigate the impact of the routing bus on the modulation behavior
of the output driver. Three routing scenarios are simulated in Momentum
Microwave: first, the MSB and LSB tail current trace for the A0 driver being
280 µm long and second, the emitter follower current trace for the predriver
with a worst-case length of 630 µm. Additionally, the trace for the base
voltage Vb of the cascode transistor from Fig. 4.19 is simulated as a 630 µm
trace surrounded by two floating traces. Generally, edge effects are enabled
in Momentum while accounting for 100 cells per wavelength for frequencies
up to 40GHz. Figure 4.24a shows the simulated layout of the MSB and LSB
tail current trace where ports P1 and P3 connect to the generic current mirror
outputs and ports P2 and P4 to the emitter of the MSB and LSB cascode
transistor respectively. Both tail current traces are surrounded by the other
tap driver current traces to include the coupling capacitance in the result.
These traces are left floating for simplicity. The Smith chart in Fig. 4.24b
reveals that the impedance seen from the transistors emitter can be capacitive
or inductive depending on the location of the port and the length of the
trace. For example, the LSB trace is a longer trace with a shorter open stub,
therefore behaving more inductively than its MSB counterpart.
Similar Momentum simulations were executed for the emitter follower cur-
PAM-4 VCSEL driver with a 4-tap FFE output stage 83
(a)
0.
2
0.
5
1.
0
2.
0
5.
0
+j0.2
-j0.2
+j0.5
-j0.5
+j1.0
-j1.0
+j2.0
-j2.0
+j5.0
-j5.0
0.0 ∞
S22
S44
(b)
Figure 4.24: (a) Tail current routing bus for 4-tap FFE PAM-4 output driver with ports
assigned to the MSB A0 (P1-P2) and LSB A0 (P3-P4) tap. (b) The Smith chart reveals that
the MSB stage sees a capacitive impedance (S22) while the LSB stage sees an inductive
impedance (S44).
rent and the base voltage buses. The corresponding S-parameter file is
subsequently imported in a Cadence Virtuoso test bench. This test bench
comprises the LSB and MSB output driver preceded with its predriver.
All circuits are biased with the generic PAM-4 current mirror which are
connected to the (pre)drivers either directly or through the S-parameter block
depending on the state of an ideal switch. A load of 120 fF in parallel with
30W was placed at the output of the driver circuit and post-layout parasitics
were included.
Output voltage eye diagrams at 40Gb/s of the MSB and LSB stage can be
found in Fig. 4.25. The ideal connection to the current mirror shows slightly
more overshoot on the falling edge than the ADS simulated connection. The
difference, however, remains marginal. It can thus be safely assumed that
the placement of the generic PAM-4 current mirror outside the data path by
using a long and wide current routing bus, imposes no performance penalty
on the system.
84 Chapter 4
-25 -12.5 0 12.5 25
-0.1
0
0.1
Time (ps)
Vo
lta
ge
(V
)
(a) MSB simulated trace
-25 -12.5 0 12.5 25
-0.1
0
0.1
Time (ps)
Vo
lta
ge
(V
)
(b) MSB ideal trace
-25 -12.5 0 12.5 25
-0.1
0
0.1
Time (ps)
Vo
lta
ge
(V
)
(c) LSB simulated trace
-25 -12.5 0 12.5 25
-0.1
0
0.1
Time (ps)
Vo
lta
ge
(V
)
(d) LSB ideal trace
Figure 4.25: 40Gb/s post-layout simulation of output driver and predriver using (a), (c)
S-parameter model or (b), (d) ideal trace as connection to generic PAM-4 current mirror.
PAM-4 VCSEL driver with a 4-tap FFE output stage 85
4.4 Asymmetric pulse shaping for multi-level signals
4.4.1 Applying pulse width distortion
As observed and studied in Chapter 2, a VCSEL exhibits non-linear behavior
under large signal modulation resulting in fall times that tend to be longer
than the rise times. This translates to broader zeros than ones in the bit
pattern, corresponding to negative PWD6. Figure 4.26 shows a captured
NRZ eye diagram from a VCSEL together with an appropriate drive current
to center the level crossings again. Unfortunately, this side effect becomes
especially problematic to compensate when considering multi-level modula-
tion to increase the data rate.
Figure 4.26: Eye diagram received of a VCSEL at 28Gb/s [69] possesses non-negligible
negative PWD that can be compensated by applying a drive current with opposite PWD.
A first attempt to resolve this issue was introducing PWD in the main MSB
and LSB path. This concept was implemented in the first generation driver
designed for Mirage, being a 2-tap FFE PAM-4 driver [46]. The PWD
was generated inside the predriver by creating a DC-offset voltage through
imbalanced current sources connected to the CML stage and voltage buffer.
A mirror ratio of two was used for the offset currents to offer the correct
scaling between the MSB and LSB stages. Applying PWD is not only
capable of adjusting the level crossing, but if combined with equalization,
can also lead to asymmetric equalization [41]. In case of a signal with
PWD>0 like in the right part of Fig. 4.26, this would result in a longer
pre-emphasis pulse for the rising edge and a shorter pre-emphasis pulse for
the falling edge.
Figure 4.27 is obtained at the electrical output of the driver for minimum and
maximum settings of PWD while also applying some equalization. Remark
that the level crossings shift up or down for all three eyes simultaneously
6PWD is defined as T1−T02 [74]
86 Chapter 4
depending on the setting. Above all, either the top eye or the bottom eye
gets expanded because of the asymmetric pre-emphasis effect. It needs
to be clarified that the minimum setting results in negative PWD for the
output voltage. Since the drive current is inverted relative to the drive volt-
age, the minimum setting will thus lead to positive PWD for the drive current.
(a) (b)
Figure 4.27: Probing of the electrical output voltage (inverted version of drive current) at
20GBd of a 2-tap FFE PAM-4 driver programmed with (a) minimum and (b) maximum
PWD settings. (20 ps/div, 50mV/div)
It becomes really interesting when monitoring the output of a >15GHz
VCSEL driven by the same PAM-4 driver while applying some 2-tap FFE and
minimum PWD, see Fig. 4.28. Although the middle eye is quite symmetrical,
the upper and lower eye on the other hand show opposite distortion, i.e.
PWD<0 for the upper eye and PWD>0 for the lower eye. This is likely caused
by the bias-dependent modulation behavior of the VCSEL. In addition, the
top eye is more open due to the asymmetric pre-emphasis. Hence, it is
therefore not possible with this topology to optimize all eyes simultaneously.
A better approach would be to provide separate duty cycle control for the
LSB and MSB drivers. Another alternative is discussed in the next section.
(a) (b)
Figure 4.28: (a) 14GBd and (b) 20GBd optical eye diagrams while applying 2-tap equal-
ization and minimum PWD [46]. (20,10 ps/div, 50mV/div)
PAM-4 VCSEL driver with a 4-tap FFE output stage 87
4.4.2 Selective falling-edge pre-emphasis
Concept Experiments utilizing the previous pulse shaping method based
on [46], revealed that the transitions to the lower level are typically the
slowest. This leads to the bottom eye being less open than the middle and
top eye and can negatively impact the BER. Hence, it would be advantageous
to pre-emphasize the bottom-level transitions apart from the regular FFE.
Particularly, each transition to the bottom level needs to be boosted. Hence, a
falling-edge detection circuit needs to sense the instant when both MSB and
LSB become logic low and generate a falling pulse accordingly. Figure 4.29
shows a conceptual diagram to realize this functionality, suitable for cathode-
driving and multi-level modulation formats. It comprises a NOR-gate, a
delay cell with time delay Td and an OR-gate with one inverted input to
create a logic output current ISFEP with pulse width Twp. This architecture
allows that the SFEP output current can be summed with the modulation
current by implementing the finalOR-gate as a CML stage. This is clearly an
improvement over [85] that requires a PMOS-stage to insert the falling-edge
current pulse at the output, which introduces a speed penalty.
Figure 4.29: Timing and logic gate diagram to generate a falling-edge pre-emphasis current
pulse ISFEP when the MSB and LSB signals transition to the bottom level
Implementation A CML topology is chosen for the implementation of
the logic gates to facilitate high-speed operation. Figure 4.30 shows the
design of the NOR-gate [86]. The output Vo can be expressed in function of
the inputs ViA and ViB:
88 Chapter 4
Vo = ViB + ViB · ViA = ViB + ViA (4.17)
Vo = ViB + ViA (4.18)
Since the outputs of the MSB and LSB flip-flops have a common-mode
output voltage close to supply voltage, a level shifter in the form of Q2 is
necessary to interface with the lower differential pair Q1. Unfortunately,
this creates a small delay in signal ViB that is not present in ViA. Headroom
of the following stage is enlarged by lowering the common-mode output
voltage Vocm through the pi-configuration of R0 and R1. The output swing
Vod is also affected hereby, with
Vod = Ib0 · (R0//R12 ) (4.19)
Vocm = Vdd − Ib02 · R0 (4.20)
The current density jcm for input pairs Q0 and Q1 is dimensioned to reduce
the parasitic emitter resistor, thereby targeting a higher gain.
Figure 4.30: CML representation of a NOR-gate with inputs ViA and ViB and output Vo.
PAM-4 VCSEL driver with a 4-tap FFE output stage 89
The delay stage is reused from the retiming chain, see Fig. 4.10 for the design.
The related time delay amounts to 10 ps or 20 ps, depending on the settings.
The delay stage and the NOR-gate are driving the inputs of the SFEP output
stage, drawn in Fig. 4.31. The output current ISFEP can be expressed as a
function of the inputs ViA (X)and ViB (X ′),
ISFEP = ViB + ViA + ViB = ViA + ViB (4.21)
leading to an OR-gate with one input being inverted. This behavior is in
accordance with the conceptual diagram from Fig. 4.29.
Figure 4.31: The SFEP output stage representing an OR-gate with inputs ViA and ViB and
output current ISFEP .
The time delay Td is set by a programmable delay cell but also includes
the fixed propagation time of the level shifter Q3. The biasing Ib1 of Q3
is equipped with cross-coupled cascode transistors Q4 to provide extra
bandwidth with less current. The full width at half maximum of the ISFEP
pulse can be set to either 17 ps or 27 ps. In addition, the amplitude of the
current pulse can be modified by the tail current Ib0 from 0 to 2.4mA.
Simulation results A 25GBd simulation of the complete PAM-4 driver
IC modulating a rate equation model of an USC VCSEL is initiated to
90 Chapter 4
showcase the effect of applying SFEP to a PAM-4 signal. Post-layout
parasitics are included for the 4-tap driver and predriver circuit as well as
for the SFEP block. The drive current is already slightly pre-emphasized
by enabling tap coefficient A1. The driver output is inherently loaded with
a pad capacitance of 120 fF, while the bond wire inductance is chosen at
250 pH. The inductance is kept small to prevent an overly peaked response
from masking the non-linear modulation characteristics of the VCSEL. Eye
diagrams are plotted in Fig. 4.32 with SFEP set at 0 percent and 29 percent
relative to its maximum value. The bias current was modified according
to the SFEP setting to maintain approximately the same average current
through the VCSEL. As such, the modulation bandwidth is kept constant for
both scenarios.
-20 -10 0 10 20
0.5
1
1.5
2
2.5
3
3.5
Time (ps)
O
pt
ica
lp
ow
er
(m
W
)
(a)
-20 -10 0 10 20
0.5
1
1.5
2
2.5
3
3.5
Time (ps)
O
pt
ica
lp
ow
er
(m
W
)
(b)
Figure 4.32: 25GBd post-layout simulation of the PAM-4 driver chip modulating a rate
equation model of an USC VCSEL showcasing the effect of SFEP: (a) 0% of SFEP, (b) 29%
of SFEP
Due to the relaxation oscillation of the VCSEL, the falling-edge transition
is slower than the rising edge. Consequently, the time instant of maximum
vertical opening for the bottom eye is shifted to the right compared to the
middle and upper eye in Fig. 4.32a. A fortunate effect of boosting the falling
edge separately, is that the time instants for maximum vertical openings
can be aligned again, see Fig. 4.32b. However, the transitions inside the
middle and upper eye undergo more damping. This behavior is quantitatively
studied in greater detail by calculating the height of each PAM-4 eye for an
increasing percentage of SFEP, as plotted in Fig. 4.33. The exact calculation
is based on evaluating the corresponding histogram while looking for a time
instant where the sum of the individual eye heights reaches a maximum. The
best performance boost occurs at a SFEP setting of 29%. The lower eye is
enlarged by 18% while the middle and upper eye are marginally deteriorated
with 5% and 3% respectively.
PAM-4 VCSEL driver with a 4-tap FFE output stage 91
0 14 29 43
460
480
500
520
540
560
SFEP setting (%)
Ey
eh
eig
ht
(m
V)
Top eye
Middle eye
Bottom eye
Figure 4.33: Evaluation of the height of each individual PAM-4 eye at 25GBd while
modulating a rate equation model of a USC VCSEL for discrete settings of SFEP
.
The simulation results clearly demonstrate the performance enhancement
obtained by enabling the selective falling-edge pre-emphasis circuit. It
should be noted though that the effectiveness degrades in scenarios where
symmetrical transitions are present because of the damping effect on the
rising edges. This chapter will be concluded with PAM-4 driver experiments
that explore the performance in a BTB electrical and optical link making
use of the different equalization techniques. In addition, transmission
experiments with the first generation MIRAGE transmitter compare the
dispersion penalties between NRZ and PAM-4 modulation.
92 Chapter 4
4.5 Experiments
4.5.1 A 2-tap FFE transmitter IC modulating a 1.5µm VCSEL
with NRZ and PAM-4
Based on the journal paper reporting about the first generation PAM-4 driver
for MIRAGE:
W. Soenen, R. Vaernewyck, X. Yin, S. Spiga, M-.C. Amann, K. S. Kaur,
P. Bakopoulos and J. Bauwelinck, “40 Gb/s PAM-4 Transmitter IC for
Long-Wavelength VCSEL Links,” in IEEE Photonics Technology Letters,
vol. 27, no. 4, pp. 344-347, Feb. 2015.
PAM-4 driver circuit
The presented transmitter IC is capable of driving a 1550 nm common-
anode VCSEL array although the experiments are restricted to one channel.
Figure 4.34 shows a block diagram of the data path. The PAM-4 drive
current for the VCSEL is generated on-chip from two NRZ data streams,
respectively referenced as MSB and LSB. A conventional PAM-4 driver
topology is used which combines the MSB and LSB currents (imsb, ilsb) at
the output node [87]. A unity-gain current buffer at the output prevents
capacitive loading from the VCSEL bias current source (ibias) and isolates the
MSB and LSB main drivers from each other. The voltage difference between
the driver supply (vdd1) and VCSEL anode supply (vdd2) was exploited to
provide extra bias current for the laser. PWD introduced by the asymmetric
response of the VCSEL can be partially corrected by introducing an offset
voltage to the predrivers (PWD control). The binary input streams MSB
and LSB are synchronized with each other using flip-flops operating at a
full-rate clock. These flip-flops also generate the time delay for the 2-tap
symbol-spaced FFE output stage. The two tap coefficients correspond to
the programmable tail-currents i0 and i1 on Fig. 4.34. Four equally spaced
pre-emphasized levels are created by sizing these tail-currents of the MSB
and LSB main drivers with a fixed ratio of two. For the first generation,
the back termination resistor RBT was chosen equal to 50W in order to
verify the electrical performance in a matched 50W environment. This
output impedance is associated with a drive efficiency of roughly 50% since
the series resistance of the laser varies between 50 and 60W. Compared
to the 50W/120W impedance ratio of the 850 nm VCSEL driver in [51],
long-wavelength VCSELs seem to be beneficial in terms of drive efficiency.
PAM-4 VCSEL driver with a 4-tap FFE output stage 93
Figure 4.34: Simplified block diagram of the data path of the first generation PAM-4
transmitter.
Experiments
The VCSEL emits light at a wavelength of 1550 nm and was developed by
TUM. The structure is based on the short-cavity BTJ concept [29]. The
characteristics of the mounted VCSEL with a BTJ diameter of 5 µm are a
3 dB bandwidth exceeding 15GHz according to Fig. 4.35a, a series resistance
between 50 and 60W, a maximum output power of 4.2mW and a slope
efficiency of 0.35W/A. The threshold and roll-over current are respectively
1.1mA and 15mA. The VCSEL is biased with an average current that can
vary between 7.7 and 8.8mA depending on the experiment.
The transmitter IC is fabricated in a 0.13 µm SiGe BiCMOS process. The die
measures 2860 µm x 1330 µm and is wire bonded onto a Rogers RO4003C
LoPro test board. The length of the bond wires is reduced bymounting the die
in a cavity. The driver is able to directly modulate a common-anode VCSEL
with PAM-4 or NRZ signals. This functionality allows us to compare the
performance of both modulation formats at equal bit rate in a VCSEL-based
SMF link for several transmission distances. The type of modulation format
generated by the IC depends on the data pattern entering its MSB and LSB
inputs. A PAM-4 VCSEL drive current is created by introducing different
94 Chapter 4
0 2.5 5 7.5 10 12.5 15 17.5 20
−18
−15
−12
−9
−6
−3
0
3
Frequency (GHz)
No
rm
al
ize
d
S
21
re
sp
on
se
(d
B)
3.75mA
6.0mA
9.75mA
15mA
(a)
0 2.5 5 7.5 10 12.5 15 17.5 20
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Drive current (mA)
O
pt
ica
lp
ow
er
(m
W
)
(b)
Figure 4.35: Characteristics of the 1550 nm VCSEL under test: (a) S21 modulation response
for various bias currents. (b) L/I curve displaying a bias current used in one of the experiments
to obtain a damped response.
Figure 4.36: Micrograph of the transmitter IC and the 1550 nm VCSEL array.
data patterns in the LSB andMSB inputs, whereas NRZ is generated when the
two input data streams are the same. This was implemented experimentally
by introducing a delay difference of around half a PRBS period between
the LSB and MSB data streams for PAM-4 or synchronizing both data
streams with each other for NRZ. The AC-coupled MSB and LSB inputs
are excited single-endedly with a 27 − 1 PRBS of 300mV peak-to-peak by
an SHF 12100B pattern generator. The transmitter drive current settings
are reconfigurable through a serial interface (SPI). Figure 4.36 shows a
micrograph of the transmitter IC and the long-wavelength VCSEL array.
Nominal supply voltages are 2.5V for the transmitter (vdd1) and 3.85V
for the VCSEL anode (vdd2). The light from the VCSEL is captured with
a manually aligned flat-cut SMF. Best case alignment trials for the bias
condition shown on the L/I curve in Fig. 4.35b produced an optical power
of −2.3 dBm for 4.1 dBm of emitted power. The 6.4 dB coupling loss is
primarily caused by the inability of the positioner to control the tilt of the
PAM-4 VCSEL driver with a 4-tap FFE output stage 95
fiber. An erbium-doped fiber amplifier proceeded with an optical attenuator
is therefore integrated into the measurement setup to provide the receiver
with a constant output power regardless of the alignment problems. The
spacing between the consecutive levels of the PAM-4 signal is preserved by
operating the 40G optoelectronic (O/E) receiver from Sumitomo in its linear
region. This restricts the average received power to maximum −4 dBm.
The first experiment examines how chromatic dispersion affects the quality
of the PAM-4 and NRZ link at 28Gb/s when covering a distance of 100m
and 1 km. Electrical eye diagrams captured from the receiver with 50GHz
remote sampling heads represent this effect visually while bathtub curves
of the horizontal and vertical eye margin attained with the SHF 11100B
BER analyzer verify this quantitatively. The results acquired in a BTB
configuration after optimization of the applied FFE serve as a reference for this
test. The doubling in spectral efficiency of PAM-4 relative toNRZmodulation
lets us expect that the latter will experience more deterioration as the
distance increases. The differing bandwidth requirements are pronounced by
programming the transmitter at maximum drive current for both modulation
formats while separately optimizing the pre-emphasis. Remark that the
available lab equipment forces us to carry out BER tests solely on the middle
eye of the PAM-4 signal which maps to the transmitted MSB data. The
settings of the transmitter were optimized such that the upper or lower
eye of the measured PAM-4 signal show a comparable eye-opening in our
experiments, allowing the measured BER based on the middle eye a fair
representation. The second and last experiment explores the maximum bit
rate achievable with the PAM-4 VCSEL transmitter and compares the energy
efficiency with state-of-the-art 850 nm VCSEL transmitters and an externally
modulated laser.
Results and discussion
At a data rate of 28Gb/s, the long-wavelength VCSEL starts to introduce
ISI due to its limited bandwidth. As can be observed in the eye diagrams
of Fig. 4.37, the on-chip symbol-spaced FFE is able to restore the eye
again. However, the applied FFE reduces the optical modulation amplitude
(OMA) significantly more for NRZ. This means that the 4.7 dB power penalty
associated to PAM-4 is not valid anymore for bandwidth limited links and
has dropped to 3.7 dB in this case.
The −4 dBm constraint on the received power leads to an OMA of the PAM-4
middle eye which approximates the sensitivity level of the Sumitomo receiver.
All qualitative comparisons will therefore be made at a reference BER of
10-9 to minimize the impact of the O/E receiver sensitivity. The bathtub
96 Chapter 4
(a)
(b)
(c)
(d)
Figure 4.37: 28Gb/s PAM-4 and NRZ received eye diagrams. Vertical scale is 50mV/div
and horizontal scale is 20 ps/div for PAM-4 and 10 ps/div for NRZ diagrams. (a) BTB
without equalization (b) BTB (c) 100 m SMF (d) 1km SMF
curves in Fig. 4.38 show that FFE restores NRZ eye height and width with
respectively 30% and 17% and makes the PAM-4 link back error free. The
inherent asymmetric rise and fall times of the VCSEL are not compensated
by the FFE as can be derived from the asymmetric shape of the bathtub
curves and eye diagrams. Expanding the transmission distance to 100m and
1 km reveals that both modulation formats experience a similar reduction in
time margin which amounts 35% at 1 km compared to the BTB case. There
is however a significant penalty offset noticeable in vertical eye margin. At
100m, the NRZ eye height is compressed with 10% which even drops to
60% for 1 km of SMF. A decline of less than 5% in eye height confirms our
expectations that PAM-4 is much more rigid against chromatic dispersion
than NRZ in directly modulated VCSEL links. Although the BER results
show that NRZ still beats PAM-4 in terms of absolute performance, it has
to be noted that only a decisive conclusion can be drawn when the receiver
is separately optimized for each modulation scheme. Parameters such as
sensitivity, pre-emphasis and dispersion will further reduce the power penalty
related to PAM-4 as the bit rate continues to rise. The stringent timing
PAM-4 VCSEL driver with a 4-tap FFE output stage 97
margin of NRZ will also play a role in choosing the most efficient topology
for the future data center link.
−20 −15 −10 −5 0 5 10 15 20−13
−12
−11
−10
−9
−8
−7
−6
−5
−4
−3
Relative delay (ps)
lo
g 1
0 
BE
R
PAM−4
 
 
BTB no FFE
BTB
100 m SMF
1 km SMF
BER 10−9
(a)
−20 −15 −10 −5 0 5 10 15 20−13
−12
−11
−10
−9
−8
−7
−6
−5
−4
−3
Relative delay (ps)
lo
g 1
0 
BE
R
NRZ
 
 
BTB no FFE
BTB
100 m SMF
1 km SMF
BER 10−9
(b)
−30 −20 −10 0 10 20 30
−12
−11
−10
−9
−8
−7
−6
−5
−4
−3
Threshold voltage (mV)
lo
g 1
0 
BE
R
PAM−4
 
 
BTB no FFE
BTB
100 m SMF
1 km SMF
BER 10−9
(c)
−60 −40 −20 0 20 40 60−13
−12
−11
−10
−9
−8
−7
−6
−5
−4
−3
Threshold voltage (mV)
lo
g 1
0 
BE
R
NRZ
 
 
BTB no FFE
BTB
100 m SMF
1 km SMF
BER 10−9
(d)
Figure 4.38: Measured bathtub curves at 28Gb/s PAM-4 and NRZ modulation for several
lengths of SMF: (a),(b) Horizontal eye margin (c),(d) Vertical eye margin.
Error-free performance was maintained up to 40Gb/s in a BTB PAM-4 setup
as shown in Fig. 4.39. The BER rises to 10−11 and 10−9 as the transmission
distance progresses from 100m to 1 km. Described by an energy efficiency
of 9.4 pJ/bit, the presented PAM-4VCSEL transmitter is able to compete with
the state-of-the-art SiGe BiCMOS 850 nm NRZ transmitters [42, 51] listed
in Table 4.5. It is almost three times more efficient than a long-wavelength
transmitter consisting of an electro-absorption modulated edge-emitting
laser [88]. The CMOS driver reports the best efficiency of them all [40], but
the four-stage design (no FFE) is far less complex compared to our design
and 32 nm SOI CMOS is a much more advanced process than 0.13 µm SiGe
BiCMOS.
98 Chapter 4
(a)
−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2−12
−11
−10
−9
−8
−7
−6
−5
−4
−3
Relative delay (UI)
lo
g 1
0 
BE
R
 
 
BTB
100 m SMF
1 km SMF
(b)
Figure 4.39: 40Gb/s PAM-4 transmission experiment: (a) Received eye diagram in BTB
setup (bottom) and after 1 km of SMF (top), 50mV/div, 10 ps/div. (b) Bathtub curve of
horizontal eye margin for several configurations.
Table 4.5: Comparison of the state of the art in O/E transmitters at the time of publication.
The energy efficiency is defined as the total power dissipated by the driver circuit and the
laser diode divided by the achieved bit rate.
Reference [40] [42] [51] [88] This
work
Technology 32 nm
SOI
CMOS
0.13 µm
SiGe
0.13 µm
SiGe
150GHz
InP-
DHBT
0.13 µm
SiGe
Bit rate (Gb/s) 35 40 64 56 40
Modulation format NRZ NRZ NRZ NRZ PAM-4
Laser bandwidth (GHz) N/A 16 26 35 >15
Supply voltage (V) 1.1/3.1 2.5/3.3 N/A -5.2 2.5/3.85
Energy efficiency (pJ/bit) 1.3 7.8 14.1 26.8 9.4
Wavelength (nm) 850 850 850 1300 1550
PAM-4 VCSEL driver with a 4-tap FFE output stage 99
4.5.2 Electrical characterization of the 4-tap FFEPAM-4 driver
Eye diagrams
To check whether or not the new design flow for the retiming stage can extend
the data rate beyond the first generation (32GBd) and hopefully even achieve
the initial specification of 40GBd, the intrinsic performance of the driver
is tested first. For this experiment, the driver inputs (CLK , MSB, LSB)
are wire bonded to single-ended transmission lines that connect through a
mini SMP cable to a 92GS/s AWG. The output of the driver, which has a
160W impedance, is probed with a GS-probe and loaded with the 50W input
of a sampling oscilloscope. The data signal is a PRBS with an even run
length of 29 and has a 300mVpp amplitude.7 The MSB and LSB signals are
decorrelated by programming a differential delay of 560 ps. The clock signal
operates at full rate with a 200mVpp amplitude and is also generated with
the AWG using a sine wave. Figure 4.40 shows the captured eye diagrams at
the maximum achievable data rate, i.e. 36GBd.
(a) PAM4 no FFE (b) PAM4
(c) NRZ LSB (d) NRZ MSB
Figure 4.40: Probed electrical output of driver displaying a PAM-4 eye at 36GBd without
and with equalization and its respective MSB and LSB components
It can be observed that the capacitive self-loading of the output stage
combined with the relatively large output impedance of 160W severely limits
7The run length of the PRBS was restricted to an even number by the AWG to create a
rational identical sample rate for all channels: MSB, LSB and CLK . This is necessary to
keep all generated signals synchronized.
100 Chapter 4
the bandwidth of the signal, necessitating the use of equalization to restore
the PAM-4 eye diagram. Evaluating the separate MSB and LSB signal that
compose the final PAM-4 signal show a distinct factor of two relation in
amplitude with similar pulse shape. Unfortunately, operation at 40GBd
resulted in noisy eye diagrams suggesting again retiming issues. To identify
the actual cause of the problem, the synchronization window is characterized
in the next paragraph.
Synchronization window
The synchronization window can be determined by providing a stimulus
similar to the simulation conditions from Fig. 4.2c. More specifically a
pulse train with a transition time of 14 ps is applied to the input of the
driver of which the delay relative to the clock is shifted until the output
becomes metastable. An example of a stable and metastable output together
with the stimulus can be inspected in Fig. 4.41. The transitions of the
metastable output show up as noise due to the inherent sampling operation
of the oscilloscope (DCA-X 86100D). Although in this particular case, the
transition is sometimes skipped entirely as explained by the high-density flat
blue line.
(a) (b)
Figure 4.41: Captured driver output (top waveform) at 36GBd when retiming a pulse train
(bottom waveform) (a) inside the sync window and (b) outside the sync window
The synchronization window is finally found after subtracting the range of
delays corresponding to a metastable output from the bit period. This process
is repeated for various clock amplitudes for baud rates of 36GBd while also
increasing the number of FFE taps at 40GBd, see Fig. 4.42. As expected
from the 36GBd eye diagrams, the sync window is large enough to sustain
a fair amount of jitter on the data input signal while avoiding metastable
PAM-4 VCSEL driver with a 4-tap FFE output stage 101
operation. Consistent with the simulation results from Fig. 4.8, the sync
window Tsync is very sensitive to the clock amplitude. This is especially
prominent in the 2-tap 40GBd case where the sync window drops below
10 ps. Interestingly, reducing the amount of FFE taps has a positive impact
on Tsync. As enabling extra FFE taps directly enables extra flip-flops, the
capacitive load for the emitter followers of the final clock buffer becomes
too large and degrades the gain at 40GHz. Increasing the source clock
amplitude can’t resolve this issue as the amplifier cascade enters the limiting
region, a behavior already observed in Fig. 4.16.
100 200 300 400
0
2
4
6
8
10
12
14
16
18
20
Clock source peak-to-peak amplitude (mV)
T
s
y
n
c
(p
s) 36GBd 2 taps
40GBd 1 tap
40GBd 2 taps
Figure 4.42: Measured sync window decreases as the data rate and/or the number of taps
increases and becomes insufficiently wide at 40GBd utilizing 2 taps.
Multiple solutions can be combined to tackle this problem. First of all,
the dimensions and the tail current of the flip-flops can be optimized, as
demonstrated in [76] by a potential reduction of 17% in power dissipation. A
negative capacitance can be placed in parallel through a negative impedance
converter [74] to further reduce the capacitive loading. In addition, the linear
range of the clock amplifier cascade can be broadened, making it feasible
to externally compensate on-chip losses of the clock signal. All together
a more radical approach can be pursued where flip-flops are only used for
synchronizing the MSB and LSB stream and the FFE is composed of analog
delay cells.
102 Chapter 4
4.5.3 Evaluation of the selective falling-edge
pre-emphasis circuit
Based on the paper published in Electronics Letters:
W. Soenen, J. Lambrecht, X. Yin, S. Spiga, M.-C Amann, G. V. Steenberge,
P. Bakopoulos and J. Bauwelinck, “PAM-4 VCSEL driver with selective
falling-edge pre-emphasis,” in Electronics Letters, vol. 54, no. 3, pp.
155-157, Feb. 2018
Introduction
The complete driver IC is implemented as a dual-channel device in a 0.13 µm
SiGe BiCMOS process and measures 1x2.8mm2. The driver output is wire-
bonded to a VCSEL array developed by TUM, as shown in Fig. 4.43. The
1.5 µmVCSEL is characterized with a maximum bandwidth of 20.6GHz and
an optical power of 3.9mW occurring at a roll-over current of 15mA [60].
The driver and the VCSEL are operated from a 2.5V and a 3.4V supply
voltage respectively. The single-ended MSB and LSB binary inputs of
the driver are excited with a 25Gb/s 400 mVpp 27-1 PRBS. Both streams
are decorrelated to each other by using a cable with approximately a 12
bit-period delay. The optical output of the VCSEL is aligned to a flat-cleaved
SMF and propagates through a variable optical attenuator before arriving
at a 32GHz linear photoreceiver (DSCR-409). The differential receiver
output is captured with a 160GS/s real-time oscilloscope to allow off-line
decoding of the PAM-4 signal. Remark that no receiver equalization or
other digital signal processing is applied before calculating the BER re-
lated to the LSB and MSB streams. We first optimized the equalizer and
SFEP settings for minimum BER. Afterwards, SFEP was disabled and the
bias current increased to maintain the same average current of 10.4mA
through the VCSEL. This methodology allows to extract the impact of SFEP
on the systemwhile the VCSEL is biased at a constant modulation bandwidth.
Results and discussion
Captured eye diagrams at 25GBd shown in Fig. 4.44 reveal that SFEP
combined with a 4-tap FFE is able to expand the lower eye while simulta-
neously improving the vertical alignment of the subsequent eyes. This is
also reflected in the BER plots of the LSB and MSB data streams, plotted
in Fig. 4.45, of which the total BER is dominated by the LSB data. The
MSB data, which directly maps to the middle eye, becomes error-free above
PAM-4 VCSEL driver with a 4-tap FFE output stage 103
Figure 4.43: PAM-4 driver wire bonded to a 1.5 µm VCSEL array and mounted on a printed
circuit board with single-ended 50W transmission lines.
-20 -10 0 10 20
-0.1
0
0.1
Time (ps)
Vo
lta
ge
(V
)
(a)
-20 -10 0 10 20
-0.1
0
0.1
Time (ps)
Vo
lta
ge
(V
)
(b)
Figure 4.44: Captured eye diagrams using real-time oscilloscope at 25GBd with annotated
decision thresholds for 1000 symbols. (a) Equalization with 4-tap FFE (b) Equalization with
4-tap FFE and SFEP expands lower eye
−5 dBm. With a record length of 1.25×107 bits, this corresponds to an upper
BER limit of 2.9 × 10−7 with 95% confidence. Considering a KP4 FEC
threshold of 2.2 × 10−4 [11], enabling SFEP can increase the link budget
by >1 dB. The transmitter power dissipation amounts to 455mW, of which
only 18mW originates from the SFEP circuit.
104 Chapter 4
−8 −7 −6 −5 −4
10−2
10−3
10−4
10−5
10−6
10−7
10−8
KP4 FEC
Average input power (dBm)
BE
R
FFE FFE+SFEP
(a) Total BER
−8 −7 −6 −5 −4
10−2
10−3
10−4
10−5
10−6
10−7
10−8
KP4 FEC
Average input power (dBm)
BE
R
LSB LSB+SFEP
MSB MSB+SFEP
(b) BER of LSB and MSB
Figure 4.45: Total BER dominated by LSB is lowered with approximately one order of
magnitude by enabling SFEP while the link budget is increased by >1 dB.
PAM-4 VCSEL driver with a 4-tap FFE output stage 105
4.5.4 Effect of a 4-tap FFE on a PAM-4 modulated 1.5µm VC-
SEL at 25GBd and 28GBd
Based on the conference paper:
W. Soenen, R. Vaernewyck, X. Yin, S. Spiga, M.-C. Amann, G. V. Steenberge,
E. Mentovich, P. Bakopoulos and J. Bauwelinck, “56 Gb/s PAM-4 driver
IC for long-wavelength VCSEL transmitters,” in (2016) 42nd European
Conference on Optical Communications (ECOC), Dusseldorf, Germany:
VDE, Sept. 2016, Th.1.C.4, pp. 980-982.
Introduction
The VCSEL was developed by TUM and has a bandwidth of 22GHz [32].
The buried tunnel junction diameter measures 4 µm and light is emitted at
a wavelength of 1533 nm. The threshold current is only 0.9mA while the
rollover current is 9mA. This corresponds to an optical power of 2.3mW.
The differential series resistance is characterized as 62W. Interconnection of
the driver IC to the printed circuit board and to the VCSEL occurs through
wire bonding and is depicted in Fig. 4.46.
Figure 4.46: Micrograph of the PAM-4 VCSEL driver IC wirebonded to a dual long-
wavelength VCSEL array from TUM and to differential transmission lines not optimized for
the single-ended driver inputs.
MSB and LSB inputs are excited with a single-ended AC-coupled 300mV
27-1 PRBS signal. Both streams are decorrelated from each other by
introducing a cable delay difference of 1 ns and an inversion. The full-rate
clock signal needs an output power of 2 dBm to bridge the losses introduced
by the differential board traces, which were not designed for single-ended
transmission. Despite alignment of the flat-cut fiber to the aperture of the
106 Chapter 4
VCSEL, coupling losses of 8 dB were present in the setup which enforced
us to insert an erbium-doped fiber amplifier (EDFA) in the optical link,
as seen in Fig. 4.47. However, these losses could be reduced in a later
stadium to omit the EDFA. The receiver side is composed of a linear 32GHz
photodiode and transimpedance amplifier (DSC-R409) connected to the
differential inputs of a 160GS/s real-time oscilloscope. The clock generator
is synchronized with the 10MHz reference output from the oscilloscope to
avoid time shifting of the recorded waveform. The BER is calculated of
the received PAM-4 signal after recovering the MSB and LSB stream and
comparing them to the reference PRBS. BER plots are generated by varying
the received input power with a variable optical attenuator (VOA). To study
the impact of the multi-tap FFE on the link performance, the number of
taps is increased for each consecutive BER plot at 25 and 28GBd. The
average VCSEL current is fixed at 6.3mA and tap coefficients are scaled to
achieve identical steady-state modulation amplitude for all BER plots. FFE
parameters are calculated oﬄine by running a least-squares error algorithm
on the received PAM-4 signal followed by a manual optimization.
PPG @ 25-28Gb/s
SHF12100B
CLK @ 25-28GHz
Anritsu MG3696B
1ns
300mV
D D
2dBm
EDFA
SMF
160Gs/s 63GHz 
real-time oscilloscope
MSB, LSB decoding
no equalization
isolator VOA DSC-R409
10MHz reference
trigger
MSB
LSB
Figure 4.47: Measurement setup for evaluating the BER of a 25 and 28GBd PAM-4 signal
generated with the VCSEL driver.
Results and discussion
Although the non-equalized NRZ eye diagrams for the MSB and LSB
component are clearly open at 28GBd, this is unfortunately not the case for
the PAM-4 signal, see Fig. 4.48. Inter-symbol interference definitely has a
severe impact, something attributed by the bias-dependent behavior of the
VCSEL. It is therefore only effective to optimize the FFE settings on the
PAM-4 signal instead of on the MSB and LSB components.
PAM-4 VCSEL driver with a 4-tap FFE output stage 107
(a) 1 tap LSB 30mV/div, 10 ps/div (b) 1 tap MSB 60mV/div, 10 ps/div
(c) 1 tap 60mV/div, 10 ps/div (d) 4 taps 65mV/div, 10 ps/div
Figure 4.48: Captured eye diagrams at 28GBd displaying the decomposition of the non-
equalized PAM-4 signal (1 tap) into its MSB and LSB component, together with the equalized
PAM-4 signal (4 taps).
Predistorting the output current of the VCSEL driver with a 4-tap FFE8,
significantly reduces the ISI of the PAM-4 signal. This statement can also
be derived from the BER plots at 25 and 28GBd in Fig. 4.49. Considering a
RS(544,514) KP4 FEC with a pre-FEC BER limit of 2.2 × 10−4 [11], sensi-
tivity is improved by 1.4 dB at 25GBd and 3.8 dB at 28GBd. Performance
at 0 dBm is well below FEC limit with a BER smaller than 10−6 and error
free at 25GBd. With a record length of 2.8 × 107 bits, this corresponds to
an upper BER limit of 1.7 × 10−7 at a 95% confidence level. At both data
rates, the optimization of the 2-tap FFE forces coefficient A1 to zero thereby
behaving identical as the 1-tap topology while the transition from 3 to 4 taps
leads to a marginal improvement. This small difference is the result of co-
efficient A2 dominating A3 in a back-to-back link as summarized in Table 4.6.
Opting for a 3-tap FFE at 56Gb/s results in 7.9 pJ/bit, being an improvement
over the first generation 2-tap PAM-4 driver (9.4 pJ/bit) [46]. It is twice
as efficient as a competing, though not power-optimized 56Gb/s NRZ
long-wavelength VCSEL driver [53]. NRZ drivers are still capable of
consuming less power because only a single data path is required, e.g.
3.8 pJ/bit is possible at 50Gb/s while also operating error free [55]. Above
all, there is a large discrepancy noticeable between the electrical and optical
performance of PAM-4 drivers. Impressive data rates of 90Gb/s and
100Gb/s are demonstrated electrically without equalization, however, these
quickly degrade to 40Gb/s and 56Gb/s while driving a VCSEL [54, 56].
8By default, SFEP is enabled for each measurement
108 Chapter 4
−7 −6 −5 −4 −3 −2 −1 0
10−2
10−3
10−4
10−5
10−6
10−7
10−8
Average input power (dBm)
BE
R
4 taps
3 taps
1,2 taps
FEC
(a)
−7 −6 −5 −4 −3 −2 −1 0
10−1
10−2
10−3
10−4
10−5
10−6
10−7
Average input power (dBm)
BE
R
4 taps
3 taps
1,2 taps
FEC
(b)
Figure 4.49: BER plots vs. number of FFE taps at (a) 25GBd and (b) 28GBd.
Table 4.6: Overview of normalized FFE coefficients relative to A0 in a 1-tap configuration
and transmitter power dissipation.
data rate taps A0 A1 A2 A3 P (mW)
25GBd
1, 2 1 0 0 0 354
3 1.11 0 -0.11 0 432
4 1.11 -0.035 -0.07 -0.015 472
28GBd
1, 2 1 0 0 0 374
3 1.11 -0.035 -0.14 0 442
4 1.11 0 -0.14 0.035 465
The reported energy efficiencies are below 4.5 pJ/bit, but are misleading
since they not include a retiming chain to synchronize the MSB and LSB
data stream. A similar discrepancy is also present in the PAM-4 driver
of this work—36GBd electrical versus 28GBd optical—and proves that
multi-level modulation is not trivial when modulating a non-linear device.
Better performance could be achieved by removing the fixed 2 : 1 relation
between theMSB and LSB data path, thereby allowing to separately optimize
the pulse shape of each eye. This approach of course doubles the degrees of
freedom and makes it harder to find the optimal solution.
PAM-4 VCSEL driver with a 4-tap FFE output stage 109
4.6 Conclusion
The comprehensive design flow that was constructed for the retiming cir-
cuitry resulted in a PAM-4 driver IC that was able to operate consistently
at data rates of 36GBd in a BTB electrical link. This is definitely an
improvement over the first generation transmitter that not always managed
to reach 32GBd, even though the depth of the equalizer was limited to two
taps. The PAM-4 modulation format combined with a 4-tap FFE architecture
results in a high-speed data path requiring 8 output drivers, 8 predrivers and
10 flip-flops. The occupied area was significantly minimized by placing
the biasing stages outside of the data path. To accommodate this feature, a
generic reusable current mirror was designed that was compatible with the
bias current specifications of each circuit. Through a co-simulation approach
between Cadence and ADS Momentum, it was proven that the wide bias
current buses routed from the generic current mirror to the high-speed cir-
cuits, had no negative impact on the signal integrity of themodulation current.
Unfortunately, the ultimate baud rate of 40Gbd listed in Table 4.2 was not
feasible. Experimental verification of the synchronization window versus
clock amplitude, data rate and number of enabled tap drivers revealed that
the drive strength of the last clock buffer is not sufficient at the maximum
data rate when more than one tap is enabled. The data rate further drops to
28GBd when modulating long-wavelength VCSELs. Even though the 4-tap
FFE lowers the BER with more than two order of magnitude below the KP4
FEC threshold, the non-linear modulation behavior of the VCSEL makes
it very difficult to effectively equalize all transitions in a multi-level signal.
Applying selective asymmetric pre-emphasis techniques to each level could
lead to better performance as illustrated by the SFEP concept. Possibly,
power consumption could be lowered if it would replace the symbol-spaced
equalizer as less flip-flops and clock buffers are required.
At 25GBd, the PAM-4 VCSEL driver consumes over 430mW. In contrast,
40Gb/s and 50Gb/s NRZ VCSEL drivers consume less than 200mW while
operating error-free [47, 55]. Therefore, PAM-4 modulated 1.5 µm VCSEL-
based links are only advantageous in applications where chromatic dispersion
limits the transmission distance [46]. However, even this bottleneck for NRZ
can be avoided when opting for 1.3 µm VCSELs, which was demonstrated in
combination with the driver described in Chapter 5, as part of the bilateral EU
FP7 project RAPIDO [28]. That is the reason why NRZ modulated 1.3 µm
VCSEL-links are the preferred technology for energy efficient 400GbE data
center intraconnects.

5
Pseudo-balanced regulated
VCSEL driver
The reader may have observed in Chapter 4 and in the VCSEL driver literature
that the anode voltage of the laser is typically chosen higher than the supply
voltage of the driver IC. This is primarily the case for back-terminated drivers
where the anode voltage affects the average VCSEL current. In order for
the VCSEL current to be solely determined by the drive current, the anode
voltage needs to be set to a specific and relatively high value. When targeting
energy efficient ICs for future data center intraconnects, it is beneficial
that the anode voltage can be lowered as much as possible. Ultimately,
single-supply operation of the VCSEL driver/link is pursued. This chapter
describes in more detail the constraint on the anode voltage and the problems
associated with conventional topologies followed by a presentation of the
developed VCSEL driver for the PHOXTROT project. The main innovations
in this driver are a novel output stage that achieves low-power operation by
removing the anode voltage constraint and an on-chip asymmetric FFE that
can compensate the non-linear modulation behavior of the VCSEL.
5.1 DC-coupled vs. AC-coupled laser driver
High-speed VCSEL drivers that directly modulate the laser current are
available in two configurations: anode drive and cathode drive. Anode drive
112 Chapter 5
Figure 5.1: Diagram of DC-coupled laser driver showing the relation between modulation
current Im, bias current Ib1 and average laser current I¯l .
has the potential to lower the supply voltage of the VCSEL driver, whereas
cathode drive avoids the use of slower p-type transistors in the high-speed
path. Both still have one thing in common though, namely that the VCSEL
driver is operated from multiple supply voltages to lower power consumption.
VCSEL supply voltages can range from 3.3V to 5.8V [39, 41, 42, 53]. In
this chapter, we will further focus on cathode drive—because the long-
wavelength VCSEL arrays of MIRAGE and PHOXTROT were constructed
with a common anode—and propose a solution to get rid of the multiple
supply voltages.
High-speed laser drivers are typically implemented with a back-termination
resistor to move the dominant pole at the drivers’ output to higher frequencies
or to reduce reflections in case of a transmission line connection between the
driver and the VCSEL. This back termination resistor Rt not only reduces
the drive efficiency (m) to the laser, but also introduces a DC current path
between the anode voltage of the laser Vdd2 and the supply voltage of the
driver Vdd1, see Fig. 5.1 and Eqs. (5.1) and (5.2).
I¯l = m
(
Ib +
Im
2
)
+
Vdd2 − Vdd1 − Vl
Rl + Rt
= m
(
Ib +
Im
2
)
+ Iextra (5.1)
m =
Rt
Rt + Rl
< 1 (5.2)
The extra current Iextra is proportional to the difference between the supply
voltages Vdd2 and Vdd1 and can be canceled if Vdd2 equals the sum of Vdd1
and the forward threshold voltage drop of the laser Vl . If Vdd2 does not meet
Pseudo-balanced regulated VCSEL driver 113
Figure 5.2: Diagram of AC-coupled laser driver showing the relation between modulation
current Im, bias current Ib2 and average laser current I¯l .
this constraint, it can either result in a negative or positive extra current, of
which the value is hard to predict or control because of the dependence on
Rt and on the series resistance Rl of the laser. The Iextra contribution can
be easily eliminated by AC-coupling the laser to the driver, as depicted in
Fig. 5.2. The bias current Ib2 represents the average laser current I¯l and
is therefore set at a higher value compared to the bias current Ib1 in the
DC-coupled configuration. At first, this would suggest that this results in a
higher power dissipation, but this topology is able to operate at a lower Vdd2.
The power consumption of both coupling configurations is analyzed below.
The anode supply voltage is chosen either to eliminate Iextra for the DC-case
resulting in Eq. (5.5) or to pursue single-supply operation (Vdd2 = Vdd1) for
the AC-case leading to Eq. (5.7). A fair comparison is made by writing Ib2 in
function of Ib1 and Im in order to guarantee equal laser operating conditions.
PDCcoupled =Vdd2m
(
Im
2
+ Ib1
)
+ Vdd1 (1 − m)
(
Im
2
+ Ib1
)
+ Iextra (5.3)
= (Vdd1 + mVl)
(
Im
2
+ Ib1
) Vdd2=Vdd1+Vl→Iextr a=0 (5.4)
=Vdd1 I¯l
(
1
m
+
Vl
Vdd1
)
(5.5)
114 Chapter 5
PACcoupled =Vdd1
(
Im
2
+ Ib2
) Vdd2=Vdd1 (5.6)
=Vdd1
(
(1 + m)
Im
2
+ mIb1
) Ib2=m(Ib1+ Im2 ) (5.7)
Assuming that the laser is biased in the linear region of the PI-curve, the ER
can be included in the equations to fix the relation between Im and I¯l, see
Eq. (5.8). Finally, Eq. (5.9) describes the ratio of the power consumption
formulas and effectively shows when the AC-coupled configuration is less
energy efficient than the DC-coupled configuration in function of the ER
and m.
ER =
P1
P0
=
I¯l + m
Im
2 − Ith
I¯l − m Im2 − Ith
P=η(I−Ith ) (5.8)
PACcoupled
PDCcoupled
=
m + I¯l−Ith
I¯l
ER−1
ER+1
1 + m VlVdd1
(5.9)
Figure 5.3a plots Eq. (5.9) for typical ER values between 3 dB and 5 dB while
assuming a I¯l = 6 mA and Ith = 1 mA. Supply voltage Vdd1 was chosen
2.5 · Vl to complete equation Eq. (5.5) and corresponds to a commonly used
2.5V when Vl = 1 V. The chart reveals that AC-coupling is more efficient
than DC-coupling considering general VCSEL applications where the drive
efficiency is less than 90% and ER is below 5 dB. While it removes the
constraint on the anode voltage, it introduces another constraint on the low-
frequency content of the signal that directly impacts the required area of the
capacitor. More specifically, the high-pass 3-dB cutoff frequency is dictated
by the parallel equivalent of Rt and Rl. A quick numerical example—
assuming a data rate of 40Gb/s, 65 consecutive identical digits (CID)1,
Rt = 100Ω, Rl = 50Ω and allowing a power penalty of 0.1 dB—results in a
capacitor C of 4 nF. If realized by metal-insulator-metal capacitors that are
specified by 2 fF/µm2 as in the available 0.13 µm SiGe BiCMOS technology,
this would occupy an area of 1.4×1.4mm2, which is not feasible to integrate
with the driver circuits in a channel pitch of 300 µm. The alternative is
to utilize external coupling capacitors, but is maybe prohibited because of
1The Gigabit Ethernet standards support 64B/66B encoding to limit the number of CID
to a theoretical maximum of 65 bits [11].
Pseudo-balanced regulated VCSEL driver 115
0.2 0.4 0.6 0.8 1
0.5
0.75
1
1.25
1.5
Drive efficiencym
P
A
C
c
o
u
p
le
d
P
D
C
c
o
u
p
le
d
5 dB ER
4 dB ER
3 dB ER
(a)
0.2 0.4 0.6 0.8 1
0.5
0.75
1
1.25
1.5
Drive efficiencym
P
A
C
c
o
u
p
le
d
P
D
C
c
o
u
p
le
d
5 dB ER
4 dB ER
3 dB ER
(b)
Figure 5.3: Power consumption comparison between an AC-coupled VCSEL driver interface
and (a) a conventional DC-coupled topology (b) an ideal DC-coupled topology versus drive
efficiency m for different extinction ratios
dense packaging requirements.
Ideally, a DC-coupled architecture should be implemented that enables
single-supply operation (Vdd2 = Vdd1) while simultaneously nulling Iextra.
Equation (5.12) expresses the power consumption ratio and is plotted in
Fig. 5.3b.
PDCcoupled =Vdd1
(
Im
2
+ Ib1
) Vdd2=Vdd1,Iextr a=0 (5.10)
=Vdd1
I¯l
m
(5.11)
PACcoupled
PDCcoupled
=m +
I¯l − Ith
I¯l
ER − 1
ER + 1
(5.12)
It can be observed that in this particular case, the ideal DC-coupled ar-
chitecture outperforms the AC-coupled variant already from m = 50 %
at an ER = 5 dB. In the next sections of this chapter, a circuit will be
proposed—as part of the PHOXTROT VCSEL driver—that approximates
the ideal DC-coupled architecture.
116 Chapter 5
5.2 DC-coupled laser driver with output common-
mode voltage control
Section 5.1 explained that a DC-coupled output stage is preferred for multi-
channel drivers. This configuration becomes even more attractive if the
constraint on the anode voltage can be eliminated by integrating an output
common-mode voltage control circuit. Laser or modulator drivers driving a
differential load typically utilize a voltage converter at the common node of
the termination resistors to control the DC output voltage [89–91]. Due to
the differential nature of the circuit, the current through the converter stays
constant. The converter is therefore isolated from the transient current and
does not influence the high-speed performance. This is, however, not valid
anymore when driving single-ended loads, as drawn in Fig. 5.4.
+
-
Figure 5.4: Differential stage driving a single-ended laser results in a transient current
through the voltage converter.
The voltage converter creates a node voltage Vz that is lower than Vdd1. The
extra current term from Eq. (5.1) becomes zero when Vdd2 equals the sum of
Vl and Vz and thus allows lowering the anode voltage independent of Vdd1.
Pseudo-balanced regulated VCSEL driver 117
Unfortunately, the transient current through the voltage converter Iz equals
the transient current through the laser due to the asymmetric load and has a
peak-to-peak swing of m · Im. The dynamics of the voltage converter are
significantly slower than the differential driver A0. Hence, the settling time
of the converter will distort the eye diagram for patterns with long strings of
CID and results in increased ISI, illustrated in Fig. 5.4. The ISI leads to a
deteriorated eye height and eye width. This introduces a power penalty at
the receiver side and makes the design of the CDR less robust. To preserve
the signal quality in the optical link, the DC-coupled laser driver needs
to be expanded with a circuit that minimizes the transient current through
the voltage converter. In addition, a control loop must be included in the
converter that adaptively cancels the extra current term depending on the
anode voltage.
5.3 Pseudo-balanced regulated laser driver
5.3.1 Concept
The first step in achieving high-speed performance is balancing the output
stage thereby minimizing the transient current through the converter. The
easiest way to reach this goal is to connect an identical VCSEL to the dummy
branch of the differential pair, as first presented in [39]. Although effective
in balancing the stage, it appears to be a costly solution as the second VCSEL
acts as a dummy device. Above all, the dual VCSEL approach holds back
dense integration of multi-channel arrays. These issues are avoided by the
balancing circuit presented in Fig. 5.5. This is a three-terminal control circuit
block that senses the voltages of both branches of the differential pair Vx
and Vy and sources a current Id into the dummy (left) branch to force Vx
equal to Vy . Consequently, the differential pair sees a symmetrical load at
DC. Because it is impossible to perfectly match the dummy with the load
impedance, the AC input impedance of the balancing circuit will differ from
that of the VCSEL, hence the name ‘pseudo-balanced’. The drive efficiency
to the dummy load will also deviate from m and is therefore referred to as
m′. Compared to the asymmetrical driver from Fig. 5.4, the value of the
peak-to-peak transient current through the voltage converter Iz is reduced
with a factor 1− m′m . In the ideal case where m′ = m, Iz remains at a constant
level resulting in a laser current Il free from ISI introduced by settling time
effects, as illustrated in the eye diagram of Fig. 5.5.
The second step to maximize performance requires the creation of a reference
voltage for the voltage converter in order to match Vz to Vzl . This is necessary
to eliminate Iextra from Eq. (5.1). As the supply voltage Vdd2 can vary
118 Chapter 5
+
-
Figure 5.5: Concept of the pseudo-balanced laser driver showing the necessary elements to
equalize Vz to Vzl and m′ to m.
under different operating conditions, it is desired that node Vz follows
these variations. Besides Vdd2, the forward voltage of the laser Vl can also
slightly change over temperature or from device to device and should also
be tracked by the reference voltage. Both tracking conditions are satisfied if
a Vl replica circuit is inserted between the Vdd2 terminal and the reference
voltage terminal of the voltage converter. Maximum flexibility is obtained
by modifying the value of the replica voltage Vlr through a dedicated control
signal modVlr . This allows interfacing the driver to a broad range of lasers
or provide calibration over temperature and after fabrication. Finally, the
voltage converter will adjust its voltage drop until Vz = Vdd2 − Vlr . This
minimizes Iextra if Vlr ≈ Vl.
5.3.2 2-tap equalizer output stage
The implementation of the pseudo-balanced regulated concept into the
PHOXTROT driver is depicted in Fig. 5.6. It comprises a 2-tap FFE, de-
noted by output driver A0 and A1 that are combined into the nodes Vx and
Pseudo-balanced regulated VCSEL driver 119
+
-
+
-
Figure 5.6: Implementation of the pseudo-balanced regulator in a 2-tap equalizer output
stage.
Vy . A common-emitter differential pair (Q0, Q1) with a preset tail current
voltage (Vtailm0, Vtailm1) was used to minimize the required headroom at the
output nodes. Since this topology is heavily affected by the Miller effect,
capacitors C1 are cross-coupled to the input and output nodes of driver A0
introducing some positive feedback. The Miller effect can be mitigated by
choosing C1 slightly smaller than the base-collector capacitance of Q0. The
voltage replica—designed as a programmable Vbe-multiplier—together with
amplifier A0 connected to NMOS transistor M0 in a series-shunt feedback
configuration, composes the voltage regulator that regulates common-node
voltageVz . By bonding the anode voltage of the laser back to the chip, the reg-
ulator will modify Vz until it approximates Vzl; thereby effectively canceling
the extra current through the VCSEL. A short high frequency return-current
loop of the drive current is accomplished by providing on-chip decoupling
C0 for the anode voltageVdd2, in additionwith external decoupling capacitors.
Amplifier A1 combined with transistor Q2 sources a current into the dummy
branch of the differential pairs to balance node voltage Vx with Vy and thus
reduces the transient current through M0. The negative low-level current
through the balancer mentioned in Fig. 5.5 is compensated by connecting a
k-scaled copy of the VCSEL bias current source Ib to the emitter of Q2. The
scale factor k is chosen slightly higher than (1 − m′) to ensure a positive
bias for Q2 under all circumstances. Dummy resistor Rd is added to prevent
breakdown from transistorQ2. Ideally, the transient current through transistor
120 Chapter 5
M0 should be constant, something very hard to accomplish. Nonetheless,
the pseudo-balanced circuit is capable of achieving this goal quite closely
after inspection of the simulated current through M0 shown in Fig. 5.7.
Indeed, the peak-to-peak current is less than 9% of the average current. In
contrast,the unbalanced topology, which is simply the same circuit from
Fig. 5.6 but with the balancer omitted, experiences a 65% swing relative
to the average current. Figure 5.8 confirms that a large transient current
swing through the converter has a detrimental impact on the high-speed
performance, necessitating the use of a balancer for 40Gb/s operation.
1 1.5 2 2.5 3
6
8
10
12
14
16
Time (ns)
Cu
rr
en
t(
m
A)
Unbalanced
Balanced
Figure 5.7: Transient current flowing through node Vz has a much smaller peak-to-peak
swing for the pseudo-balanced configuration compared to the unbalanced configuration.
-25 -12.5 0 12.5 25
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
Time (ps)
Vo
lta
ge
(V
)
(a)
-25 -12.5 0 12.5 25
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
Time (ps)
Vo
lta
ge
(V
)
(b)
Figure 5.8: 40Gb/s 29 − 1 PRBS simulation of (b) a pseudo-balanced regulated output stage
circuit driving a 50W load showing significant improvements compared to (a) an unbalanced
regulated output stage.
The voltage regulation capabilities of the PHOXTROT driver measured with
Pseudo-balanced regulated VCSEL driver 121
0
2
4
6
8
10
12
14
16
18
1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8
A
v
er
a
g
e 
V
C
S
E
L
 c
u
rr
en
t 
I l
 (
m
A
) 
Anode supply voltage Vdd2 (V) 
Regulated driver
Conventional driver
Figure 5.9: The regulated output stage is capable of delivering a certain operating current to
the VCSEL (10mA in this case) at a lower anode voltage than a conventional cathode-drive
stage, even though both output stages are configured for equal drive current.
a SC VCSEL biased at 10mA can be observed in Fig. 5.9. Notice that
the VCSEL current is almost constant across the anode voltage range of
2-2.8V while the supply voltage of the driver is set at 2.5V. The obtained
result at DC suggests that the novel topology for the output stage allows
single-supply operation at 2.5V of the VCSEL driver. This, however, still
needs to be verified during high-speed operation, which will be covered in
the next section. Remark that a conventional back-terminated cathode-drive
output stage requires a Vdd2 of 3.6V to accomplish the same VCSEL current,
unless of course the drive current would be increased. The conventional
driver was simulated with the non-linear VCSEL model found in Fig. 2.9.
5.3.3 Common-mode output voltage regulated predriver
One of the reasons that the output stage maintains drive current stability
across a wide range of anode voltages, is the low headroom required by the
differential pairs A0 and A1. More specifically, the headroom across the
switching transistors Q0 and Q1 is relaxed by minimizing the voltage across
the tail current sources (Vtailm0, Vtailm1). This functionality is realized by
a predriver—driving the inputs of output stage A0 (A1)—that regulates its
common-mode output voltage with respect to a desiredVre f tail , see Fig. 5.10.
A voltage regulator constructed in a series-shunt feedback configuration
around NMOS transistor M0 and amplifier A0 can equalize Vtailm0 (Vtailm1)
to Vre f tail by incorporating the switching transistors (Q1) from the output
122 Chapter 5
+
-
Figure 5.10: Predriver (for output stage A0) that regulates the common-mode output voltage
V¯o with respect to a desired Vre f tail .
stage inside the feedback loop. The voltage reference Vre f tail, generated
with a Vbe-multiplier, varies from 270mV to 330mV under process and
temperature variations. The large input capacitance of the output stage A0
(A1) is buffered using emitter followers Q1. The current through these
buffers was reduced by cross-coupling the cascode transistors Q2 of the
current source Ie f 0, introducing additional peaking in the frequency response.
Finally, the drive strength of the predriver is dynamically adapted based on
the modulation current of the output stage to improve switching efficiency.
5.4 Channel architecture
The complete architecture of the VCSEL driver channel is drawn in Fig. 5.11.
Two supply voltage terminals Vdd1 and Vdd2 are present for the driver and the
VCSEL respectively. This allows to verify the impact of Vdd2 on the static
and dynamic drive characteristics as well as to provide testing flexibility
for out-of-range drive currents. The pseudo-balanced regulator combines
the currents at the output nodes to construct a 2-tap FFE filter response
determined by tap coefficients A0 and A1. The driver consists of an output
driver and a predriver and is responsible for the voltage-to-current conversion.
Pseudo-balanced regulated VCSEL driver 123
The delay and the polarity of the pre-emphasis pulse for A0 and A1 can
be controlled by a cascade of three delay stages. As already demonstrated
by [41], introducing explicitly PWD in the data path A1 can result in an
asymmetric equalized response. For this work, the asymmetric behavior is
further expanded by also introducing PWD in the main data path A0. A
CTLE interfaces the delay stage cascade to the predriver and is convenient
for compensating group delay variation created by the delay stages or for
boosting the amplitude of a bandwidth limited input signal. The differential
input signal is delivered to the first delay stage through two 50W on-chip
transmission lines.
+
-
Figure 5.11: Channel architecture of the PHOXTROT driver highlighting the asymmetric
2-tap FFE structure, supported by introducing adjustable PWD in each path.
The complete architecture is simulated at 50Gb/s when driving a rate
equation model of the 19GHz SC VCSEL used in [47]. Post layout parasitics
were extracted for the driver and pseudo-balanced regulator from Fig. 5.11.
The effect of the wire bond is investigated by assuming an inductance of
0 pH and 200 pH while the bond pad capacitance is fixed at 120 fF. The
optimal equalizer settings above 40Gb/s use tap A1 as a precursor instead
of a post cursor. Eye diagrams are shown in Fig. 5.12 where the bias of
the VCSEL is fixed at 10.5mA. Modulating the VCSEL with an ideal 50W
450mVpp voltage source reveals a reduced eye opening due to insufficient
bandwidth of the device, see Fig. 5.12a. The eye closure is even more severe
when the laser is modulated with the non-equalized driver. This is mainly
caused by the aggregate base-collector capacitance of the switching and bias
transistors that load the output node. Enabling pre-cursor FFE—without
scaling the main cursor—reduces the ISI and slightly extends the vertical
eye opening. The peaking of the wire bond inductance can further improve
on this. However, the value of the inductance is very critical, since 200 pH
124 Chapter 5
-20 -10 0 10 20
2
3
4
5
6
Time (ps)
O
pt
ica
lp
ow
er
(m
W
)
(a)
-20 -10 0 10 20
2
3
4
5
6
Time (ps)
O
pt
ica
lp
ow
er
(m
W
)
(b)
-20 -10 0 10 20
2
3
4
5
6
Time (ps)
O
pt
ica
lp
ow
er
(m
W
)
(c)
-20 -10 0 10 20
2
3
4
5
6
Time (ps)
O
pt
ica
lp
ow
er
(m
W
)
(d)
Figure 5.12: Simulation of a 19GHz SC VCSEL biased at 10.5mA and modulated with (a)
ideal 50W source (b) PHOXTROT driver without FFE (c) PHOXTROT driver with precursor
FFE (d) PHOXTROT driver with identical precursor FFE as in (c) and 200 pH bond wire
inductance.
already creates slightly too much overshoot leading to undesired transitions
inside the eye. Remark that this inductance corresponds to a wire bond
length of less than 200 µm, which is not trivial to achieve.
Pseudo-balanced regulated VCSEL driver 125
5.5 Experiments
5.5.1 Electrical characterization
The experimental results obtained at 40Gb/s in a 50W environment, shown
in Fig. 5.13, clearly demonstrate the asymmetric pre-emphasis functionality
by adjusting the PWD in the A0 and A1 output stage. During the probing
experiments, the limits of the electrical performance of the driver were tested
by operating at data rates of 40, 50 and 56Gb/s in a BTB connection to the
error analyzer. A BER<1E-12 was achieved for all three data rates while
using a PRBS of 27 −1 as input data. The result before and after equalization
at 56Gb/s can be observed in Fig. 5.14. Transitioning to a much more
demanding PRBS of 231 − 1 introduces a penalty at 50Gb/s leading to a
BER<1E-7, while this penalty was absent at 40Gb/s.
Figure 5.13: Output of driver probed at 40Gb/s while using extreme settings for controlling
the PWD to demonstrate the asymmetric FFE functionality (100mV/div, 10 ps/div).
(a) (b)
Figure 5.14: Output of driver probed at 56Gb/s using PRBS of 27 − 1 while applying (a) no
FFE and (b) 55% postcursor FFE (100mV/div, 5 ps/div).
126 Chapter 5
5.5.2 A 40Gb/s 1.5 µmVCSEL link operating at a single supply
voltage of 2.5V
Based on the conference paper:
W. Soenen, B. Moeneclaey, X. Yin, S. Spiga, M.-C. Amann, C. Neumeyr,
M. Ortsiefer, E. Mentovich, D. Apostolopoulos, P. Bakopoulos and J.
Bauwelinck, “A 40-Gb/s 1.5-µmVCSEL link with a low-power SiGe VCSEL
driver and TIA operated at 2.5V,” in 2017 Optical Fiber Communications
Conference and Exhibition (OFC), Los Angeles, CA, USA: OSA, Mar. 2017,
pp. 1-3.
Introduction
We present a 0.13 µm SiGe BiCMOS driver and TIA IC that can operate a
single-mode 1.5 µm VCSEL link up to 40Gb/s using a single supply voltage
of 2.5V. The transmitter die measures 2.5×1.3mm2 and is designed as a
4-channel driver with a channel pitch of 300 µm. The driver outputs are
wire bonded to two different 2 × 1 single-mode VCSEL arrays developed by
Technische Universität München, see Fig. 5.15. Only the rightmost VCSEL
is used for the experiments of this paper. The double-mesa short-cavity
1543 nm device is characterized by a buried tunnel junction diameter of 5 µm,
a bottom-mesa diameter of 18 µm and a small-signal bandwidth in excess of
19GHz [60]. The maximum output power of 4.5mW occurs at a rollover
current of 18mA. The 0.13 µm SiGe receiver used in the experiments is
composed of an common-emitter active shunt feedback TIA followed by a
cascade of amplifiers [92]. The gain of the TIA and amplifier stages can be
adapted for linear or limiting operation. The receiver is a 2-channel TIA with
a channel pitch of 750 µm. The TIA measures 3.6×0.98mm2 and is wire
bonded to two separate photodiodes (PDs) . The PD has an active area of
12 µm and a responsivity of 0.63A/W. The PD bandwidth is above 35GHz
for bias voltages between 2.5V and 3V.
The performance of the VCSEL link is tested at 28 and 40Gb/s using a
differential 600mV PRBS of 27 − 1 as input signal for the driver. The run
length of the PRBS is intentionally limited to 27 − 1 in order to focus the
performance evaluation on the assembled SiGe driver circuit, since [28]
revealed that InP-based long-wavelength VCSELs suffer from a BER floor of
10−10 at 40Gb/s using longer sequences such as PRBS31. Light is coupled
in and out of the optical devices using lensed fibers. Sensitivity curves
are measured BTB and over 100m and 500m of SMF. The concept of the
balanced regulated output stage inside the transmitter will also be verified.
Pseudo-balanced regulated VCSEL driver 127
Figure 5.15: Micrograph of the assembled VCSEL driver (left) and the TIA with PIN
photodetector (right).
Results and discussion
The VCSEL is biased at an average current of 12mA for all experiments.
An ER of 5.3 dB is achieved before equalization and reduces to 3.6 dB after
equalization. The transmitted and received eye diagrams at 28Gb/s and
40Gb/s are shown in Fig. 5.16 together with the BER curves. The TIA is
pushed to the limiting region to maximize the eye opening, resulting in a
differential swing of 400mVpp. Applying postcursor FFE improves the eye
height at 40Gb/s, although the performance improvement is more clear in the
BER curves, revealing a gain in sensitivity of 2.3 dB OMA at a BER below
10-11. Increasing the data rate from 28Gb/s to 40Gb/s induces a power
penalty of 2 dB OMA. The VCSEL link is able to operate at a BER< 10−12
over 500m at 28Gb/s and 100m at 40Gb/s. At 500m, the maximum optical
power coupled into the PD is limited by the setup preventing to go below a
BER of 10−9. Figure 5.9 already showed that the transmitter can preserve
a constant current through the laser over a wide range of anode voltages.
The BER curves from Fig. 5.17 prove that also high-speed performance is
only marginally affected when changing the anode voltage, i.e. when the
switching transistors are not pushed into saturation as is the case at 2.3V.
The power consumed by the receiver amounts to 175mW while the transmit-
ter dissipates 132mWat 28Gb/s and 172mWat 40Gb/s, leading to 8.7 pJ/bit
at 40Gb/s. Although the link is around 5 dB less sensitive than [39, 53],
our link is almost three times more energy efficient while operating at 2.5V
compared to the 4, 5 and 5.8V used in [39]. With respect to a 25Gb/s
common-cathode VCSEL link operating at 1 and 3.3V [41], our common-
anode link is 3 dB more sensitive at 28Gb/s with similar power consumption.
128 Chapter 5
(a)
−14 −12 −10 −8 −6 −4 −2
10−2
10−3
10−4
10−5
10−6
10−7
10−8
10−9
10−10
10−11
10−12
10−13
Optical modulation amplitude (dBm)
BE
R
28G BTB
28G 500m
40G BTB
40G BTB FFE
40G 100m FFE
40G 500m FFE
(b)
Figure 5.16: (a) Transmitted eye diagrams (characterized with a photodiode (Optilab PD40)
that has a 4 dB peaked frequency response to obtain 36GHz of bandwidth, contributing
significantly to the overshoot) and received eye diagrams. (b) BER curves at 28Gb/s and
40Gb/s for various transmission distances.
Pseudo-balanced regulated VCSEL driver 129
−11 −10 −9 −8 −7 −6 −5
10−2
10−3
10−4
10−5
10−6
10−7
10−8
10−9
10−10
10−11
10−12
10−13
Optical modulation amplitude (dBm)
BE
R
Vanode 2.5 V
Vanode 2.3 V
Vanode 2.9 V
Figure 5.17: Impact of the VCSELs anode voltage at 40Gb/s
Overall, the pseudo-balanced regulated transmitter excels in lowering the
required supply voltage of the laser, even when comparing to CMOS anode-
drive implementations in Fig. 5.18. According to Fig. 5.19, the presented
VCSEL driver is, in terms energy efficiency, on par with the state-of-the-art
designed in bulk CMOS, SiGe and InP transistor technologies. However,
driver circuits developed in an advanced and expensive silicon-on-insulator
CMOS process are by far the most energy efficient solution because of
the small parasitic capacitance to the substrate and the fast low-threshold
transistors operating at sub-1V supply voltages [40, 49].
130 Chapter 5
Figure 5.18: The discrepancy between the driver and the VCSEL supply voltage is present
in various transistor technologies, except in our design.
Figure 5.19: The presented VCSEL driver is, in terms energy efficiency, on par with the
state-of-the-art designed in bulk CMOS, SiGe and InP transistor technologies.
Pseudo-balanced regulated VCSEL driver 131
5.6 Conclusion
This chapter started with a power consumption analysis of an AC-coupled
and DC-coupled resistively back-terminated VCSEL driver configuration
and concluded that AC-coupling is the most efficient variant because it can
operate at a lower anode voltage. However, it was highlighted that this
topology is not preferred for multi-channel transmitters as the coupling
capacitor requires quite some chip area to minimize the impact of the high-
pass behavior on low-frequency signals. To avoid using AC-coupling while
also minimizing the supply voltage and power consumption, a novel DC-
coupled topology was proposed that formed the basis of a patent application.
Basically, voltage regulation is added to the drivers’ output stage to maintain
the bias current through the VCSEL independent of the value of the anode
voltage. Meanwhile, high-speed performance is preserved by balancing
the output stage and equipping it with an asymmetric FFE. Moreover, this
technique can also be applied to the output stage of the PAM-4 VCSEL
driver from Chapter 4. Electrical performance of the PHOXTROT VCSEL
driver was demonstrated up to 56Gb/s. On the other hand, 40Gb/s error-free
operation up to 500m was demonstrated in a custom 1.5 µm VCSEL-based
SiGe BiCMOS optical link [47]. Although not discussed in this dissertation,
a 1.3 µm VCSEL modulated with the presented driver demonstrated error-
free operation at 40Gb/s up to 4.5 km [28]. Since these experiments were
all achieved running of a single 2.5V supply voltage,—something which
not even CMOS anode-drive transmitters have accomplished yet—the novel
DC-coupled output stage paves the way for low-power VCSEL-based data
center intraconnects.

6
Conclusions and some prospects
Throughout this dissertation, several aspects concerning the development of
the respective high-speed VCSEL drivers—ranging from the modeling of
the laser to the AC-analysis of transistor circuits—are covered, which can
also be integrated in a more general design flow for laser diode drivers. The
first step in this process deals with the modeling of the laser, being in this
case a long-wavelength InP-based VCSEL with buried tunnel junction. A lot
of material has already been published on laser diode modeling in general,
which differentiate in terms of complexity and usage. However, not much is
reported yet on the modeling of VCSELs featuring a buried tunnel junction.
The key here is to find a modeling approach that accurately captures the
modulation behavior of the VCSEL with minimum effort, i.e. reducing the
amount of experiments to extract the model parameters. Preferably, these
model parameters can be extracted independently from the manufacturer.
In Chapter 2, a model was proposed that fulfills these requirements since
it can be constructed solely from S-parameters and PIV -curves. It com-
prises the cascade of a lumped electrical access network and a Verilog-A
implementation of the coupled rate equations. Both components are linked
to each other by the current flowing through a Schockley diode, which
is equivalent to the heterojunction in the active region of the laser. The
coupled rate equations describe the relation between the carrier density and
the photon density in the active region. This part is crucial to model the
non-linear modulation behavior of the laser and was verified successfully
against probing experiments on a VCSEL die.
134 Chapter 6
The next step in the design flow consists of deriving a topology for the driver
IC that can efficiently modulate the laser at a certain data rate. Unfortunately,
a large design space arises—comprising the drivers’ impedance, interconnect
inductance, bias current of the VCSEL, feed-forward equalization, etc.—
making it difficult to quickly find the optimum. Exploring this design space
using an electronic circuit simulator in combination with the Verilog-A VC-
SEL model, is not the most time efficient solution. Hence, a linearized model
of the drivers’ output stage interfaced to the VCSEL is created in Chapter 3,
which can be simulated using numerical computing tools such as Matlab or
Python. The impulse response extracted from the driver-VCSEL interface
network can be inserted into a linearized optoelectronic communication
link that e.g. also includes the impulse response of a photodetector. The
performance of the system can be evaluated by monitoring the MSE while
sweeping the model parameters. In addition, the driver model can be easily
expanded with an FIR filter, of which the tap coefficients can be optimized
using a least-squares solver. In the end, an optimum trade-off between
performance and energy efficiency can be chosen after sweeping multiple
parameters simultaneously. As VCSELs typically require a large bias current
to ameliorate high-speed operation, the linearized model does not deviate
significantly from the rate equation model. Even though this statement
proves to be correct after comparing a simulation with measurement results
of a wire bonded VCSEL driver assembly, it is still highly recommended
to utilize the non-linearmodelwhen designing the driver at the transistor level.
The subsequent chapters of this dissertation elaborated on the architecture of
the driver ICs, which have the primary goal of overcoming the bandwidth
limitation introduced by theVCSEL transmitter assembly. Chapter 4 proposes
a topology that exploits PAM-4 to effectively double the spectral efficiency
of the modulated signal. However, since the multi-level signal is extremely
susceptible to the non-linearity of the VCSEL, equalization of the drive
current is mandatory to reduce the associated ISI. A 4-tap symbol-spaced FFE
output stage was implemented that is capable of compensating the ringing and
the insufficient rise time of the step response. The symbol-spaced tap delay
was realized using cascaded flip-flops, which also served to synchronize the
binary LSB and MSB input data streams. This chapter comprehensively
discussed the design methodology of this retiming chain since the metastable
flip-flop region introduces a hard constraint on the maximum achievable
data rate. In addition, a compact layout of the 4-tap FFE PAM-4 data
path was made possible by designing a generic reusable current mirror
that can be placed outside the high-speed data path area. Verification
Conclusions and some prospects 135
through electromagnetic co-simulation of the long current routing buses
revealed negligible impact on the signal integrity of the output drive current.
Experiments at 28GBd (56Gb/s) conducted with a 1.5 µm 22GHz VCSEL
showcase a BER < 10−6 at 0 dBm average optical power, which is well
below the KP4 FEC BER threshold of 2.2 × 10−4. Power consumption of
the PAM-4 VCSEL driver assembly was equivalent to 7.9 pJ/bit. Aside from
the symbol-spaced FFE, the driver chip also featured a selective falling-edge
pre-emphasis circuit that was designated to restore the lower PAM-4 eye,
which is typically deteriorated due to the slower falling-edge transition. The
inclusion of this circuit lead to a > 1 dB improvement in link budget at the
KP4 FEC limit for 25GBd operation, while accounting for only 4% of the
transmitter power dissipation.
Chapter 5 focuses on achieving low-power operation at 40Gb/s NRZ mod-
ulation. More specifically, single-supply operation of the cathode-drive
VCSEL transmitter is pursued without affecting the DC-coupled laser drive
current. This is accomplished by equipping the laser driver with a novel
pseudo-balanced voltage-regulated output stage. The pseudo-balanced prop-
erty is necessary to reduce the transient peak-to-peak current through the
single-endedly loaded voltage-regulated output stage, thereby preserving
high-speed performance. Error-free operation of a 1.5 µm VCSEL-based
SiGe BiCMOS link is attained at 40Gb/s over 500m of SMF , while running
of a single power supply of 2.5V. The energy efficiency of the transmitter,
specified at 4.3 pJ/bit, is on par with the state of the art, while the laser
power supply voltage requirements are even lower than CMOS anode-drive
implementations. The same driver IC was also integrated with a 1.3 µm
InP-based VCSEL. The near-zero chromatic dispersion characteristics of a
standard SMF at this wavelength (O-band) definitely are favored over emitting
in C-band, since error-free transmission was achieved at 40Gb/s over 4.5 km.
Even though higher data rates were demonstrated in this work at C-band using
PAM-4 modulation, it is anticipated that NRZ modulated O-band VCSELs
will be the preferred candidate transmitter for spatial division multiplexed
400Gb/s Ethernet links. Of course, this statement assumes that the recently
developed techniques to increase the bandwidth of 1.5 µm InP-based BTJ
VCSELs, will be adopted by the 1.3 µm InP-based devices to achieve net
data rates of 50Gb/s. This would allow to omit the KP4 FEC integration, as
proposed by the 400GBASE-FR8 standard. Above all, using VCSELs for this
applicationwould result in themost cost - and energy efficient implementation
compared to directly/externally modulated edge-emitting lasers. However,
maintaining this data rate with sufficient link budget at temperatures up
to 70 ◦C case module temperature and 90 ◦C substrate temperature will be
136 Chapter 6
the biggest challenge, as optical output power and modulation bandwidth
quickly degrade over temperature [9, 26, 72]. Transmit equalization will
play a big role in achieving this goal [94]. Longer distances up to 10 km,
as requested by the 400GBASE-LR8 standard, could be made possible by
inserting KP4 FEC functionality in the O-band VCSEL transmitter. Based
on the results obtained in this work and recent literature, it can be concluded
that PAM-4 modulation applied to VCSEL transmitters does not necessarily
lead to a higher data rate or better energy efficiency, especially when also
considering FEC for NRZ. However, further improvements are still possible
for PAM-4 VCSEL driver ICs by e.g. replacing the symbol-spaced equalizer
with an asymmetric level-selective pre-emphasis topology. It would also be
worth investigating and comparing the performance of multi-level and NRZ
modulated VCSEL transmitters for elevated temperatures, where the VCSEL
modulation bandwidth and output optical power become lower.
In the end, even 50GBd could become feasible with VCSELs, but the total
cost of the link could reach a certain level that, due to reduced reliability
or the integration of a digital signal processor in the receiver, other tech-
nologies such as silicon photonics provide a better alternative. A direct or
electro-absorption modulated III–V-on-silicon DFB laser already reached
56Gb/s NRZ, while the heterogeneous integration results in a compact
form factor [95]. On the other hand, optically generating a PAM-4 signal
allows to avoid the linearity constraint on the electrical and optical devices.
Very promising results at 112Gb/s were demonstrated by binary modulating
two parallel electro-absorption modulators [96]. Ultimately, monolithic
integration of photonic devices with transceiver electronics is pursued. A
micro-ring modulator was successfully implemented aside a driver circuit
in a 130 nm SiGe BiCMOS process, reaching data rates of 40Gb/s [97].
Another example reports 25Gb/s accomplished with a monolithically in-
tegrated micro-ring modulator and a 130 nm silicon-on-insulator CMOS
inverter driver [98]. Before these silicon photonic solutions will be migrated
into data center intraconnects, the fiber-to-chip coupling will first need to
be optimized to reduce the coupling and polarization-dependent loss. Edge
couplers can reduce the coupling loss below 2 dB per coupler, but require
sub-µm alignment accuracy to keep the loss penalty within < 3 db [99],
while 2-D grating couplers are polarization independent and relax the fiber
alignment precision, but experience 7 dB of coupling losses per coupler [100].
It is clear by now that the optimal technology for > 50 GBd data center
transceivers will depend on a lot of parameters—including cost, reliability
and energy efficiency—as the system complexity expands to sustain higher
data rates. One thing is certain though, exciting times lie ahead. . .
A
Tunnel diode operation
A.1 Simplified band theory of solids
In semiconductors, isolators and conductors; electrons are situated in energy
bands, i.e. the valence band and conductance band, rather than discrete
energy states. For isolators, the valence band is full and the conduction
band is empty, indicating that there is no conduction of electrical current.
The band gap, which is the energy difference between the two energy
bands, is high enough to block thermal excitation of valence electrons. The
opposite occurs with conductors, where the conductance band overlaps
with the valence band, resulting in a permanent presence of free electrons.
Semiconductors observed at 0K reveal an empty conduction band and a full
valence band similar to insulators. However, the band gap is small enough
allowing electrons to jump to the conduction band under energy excitation
e.g. thermal or photonic. An electrical current flow can therefore be created,
hence the name semiconductor. An important aspect when dealing with
energy band diagrams is the Fermi level dictated by the Fermi function
Eq. (A.1), which follows from Fermi-Dirac statistics. The Fermi function
describes the probability distribution of electrons across the energy states for
a certain temperature. The Fermi level EF corresponds to the hypothetical
energy state that has a 50% probability of being occupied by an electron. In
other words, most electrons are situated below the Fermi level as governed by
Eq. (A.1), see Fig. A.1 for a visual representation for multiple temperatures.
138 Chapter A
Figure A.1: Fermi function plotted on the energy band diagram versus temperature [101].
f (E) =
1
e
(E−EF )
kT + 1
(A.1)
Silicon is characterized by four valence electrons and shares electron pairs
with four other silicon atoms through a covalent bond, thereby forming
a crystal lattice. Applying a voltage to the end contacts of the intrinsic
semiconductor can allow an electrical current to flow because of the inherent
low band gap. This current flow is composed of electron movement in
the conduction band and hole movement in the valence band since each
freed electron from a covalent bond creates a hole. The conduction of
holes and electrons can be greatly increased by adding impurities to the
crystal lattice in the form of dopants. N-type dopants such as arsenic or
phosphorous are called donor atoms because they contribute to free electrons
and P-type dopants such as gallium or boron contribute to extra holes and
are called acceptor atoms. This creates extra electron energy levels near the
conductance band for N-type dopants and extra hole energy levels near the
valence band for P-type dopants, see Fig. A.2.
A.2 Tunnel diode
A tunnel diode is created if both sides of a pn-junction are heavily doped
resulting in a narrow depletion region [103]. Figure A.3 shows the energy
band diagram of a tunnel diode in thermal equilibrium. Notice that, compared
to Fig. A.2, the Fermi level is now located inside the energy band because of
the high doping concentration. The tunnel effect will be explained at 0K
Tunnel diode operation 139
Figure A.2: Fermi level can be placed closer towards conduction band or valence band
depending on the dopant type [102].
temperature. In this particular case, all energy states are filled below Fermi
level and all energy states above Fermi level are empty. In equilibrium, the
Fermi levels for both p-side and n-side are equal resulting in a zero current
flow. However, if a voltage bias is applied, the Fermi levels EFn and EFp
change relative to each other. For reverse bias, this means that the EFn
is situated lower than EFp resulting in empty energy states at the n-side
that coincide next to filled states at the p-side. An electric field across the
depletion region is then capable to move electrons from the p-side valence
band to the n-side conduction band, see Fig. A.4. This process is called
band-to-band tunneling and results in a tunneling current flow. The opposite
occurs when applying a small forward voltage across the junction causing
n-side conduction band electrons to tunnel to the p-side valence band. As
the forward voltage is increased, less and less empty p-side energy states are
aligned to occupied n-side states causing a decease of the tunneling current.
The diode behaves equivalently as a negative resistance in the region where
the junction current decreases proportional to the forward bias voltage. This
process continues until the tunnel effect transitions to a thermal current that
is associated to a normal diode behavior.
140 Chapter A
Figure A.3: Energy band diagram of a tunnel diode in thermal equilibrium [104].
Figure A.4: Tunnel effect visualized on the energy band diagram and I-V curve transitioning
from reverse bias (a) to equilibrium (b) to forward bias (c-e) [104].
B
VCSEL models in Verilog-A
B.1 Non-linear lumped element electrical access net-
work and optical resonator
Figure B.1: Electrical lumped VCSEL model that explains the structure of the proceeding
Verilog-A code. Remark that the test outputs are not included in the drawing.
Listing B.1: Verilog-A code for a non-linear VCSEL model based on [49].
1 ‘ i n c l u d e " c o n s t a n t s . vams "
2 ‘ i n c l u d e " d i s c i p l i n e s . vams "
3
4 module VCSEL_USC_nonlinear_Raj ( Cat , Popt , An , VRjunc , Rmon ,
Lmon ) ;
5 i n o u t An , Cat ;
142 Chapter B
6 ou t p u t Popt , VRjunc , Rmon , Lmon ; / / VRjunc , Rmon and Lmon a r e
used as t e s t o u t p u t s
7 e l e c t r i c a l An , Antemp , Cat , Popt , Junc , I n t r , VRjunc , Rmon ,
Lmon ;
8 ground gnd ;
9
10 r e a l Rsp ; / / [Ohm]
11 r e a l Cm; / / [ F ]
12 r e a l Rm; / / [Ohm]
13 r e a l Ra ; / / [Ohm]
14 r e a l Cp=84 f ; / / [ F ]
15 r e a l Iv ; / / [A]
16 r e a l I a ; / / [A]
17 r e a l Rtemp=1m; / / i n t e rm e d i a t e r e s i s t a n c e [Ohm]
18 r e a l n =2 . 1065 ; / / h e t e r o j u n c t i o n i d e a l i t y f a c t o r
19 r e a l I s =66p ; / / s a t u r a t i o n c u r r e n t [A]
20 r e a l I t h =0 .83m; / / t h r e s h o l d c u r r e n t [A]
21 r e a l F =2 . 7 4 ; / / s l o p e e f f i c i e n c y [W/A]
22 r e a l p o l e _ t h e rma l [ 0 : 1 ]={ −1 /1 e−6 , 0} ; / / t h e rma l t ime c o n s t a n t
[ s ]
23 r e a l L ; / / o p t i c a l r e s o n a t o r i n d u c t o r [H]
24 r e a l C=100 f ; / / o p t i c a l r e s o n a t o r c a p a c i t o r [ F ]
25 r e a l R ; / / o p t i c a l r e s o n a t o r r e s i s t o r [Ohm]
26 r e a l D=10 . 041 ; / / D− f a c t o r [GHz / s q r t (mA) ]
27 r e a l gamma0=10 . 27 ; / / damping o f f s e t [ 1 / ns ]
28 r e a l K=0 . 2536 ; / / K− f a c t o r [ ns ]
29
30 b ranch ( Junc , Cat ) Cjunc , Rjunc ;
31
32 ana l og beg in
33
34 I (An , Antemp ) <+ V(An , Antemp ) / Rtemp ; / / Rtemp
i n t r o d u c e d t o measure t o t a l VCSEL c u r r e n t
f l ow ing between Anode and Cathode
35 I ( Antemp , Cat ) <+ Cp∗ dd t (V( Antemp , Cat ) ) ;
36 Iv= l a p l a c e _ n p ( I (An , Antemp ) , {1} , p o l e _ t h e rma l ) ; / / low−
pa s s f i l t e r i n g o f VCSEL c u r r e n t t o d e t e rm i n e
t h e rm a l l y s e t t l e d e l e c t r i c a l e l emen t s
37 Rsp =3 .781+16 . 5608 / ( 1+ Iv ∗1 e3 ) ;
38 Ra=n∗ $v t / Iv ;
39 Rm=41 .2379+59 .5092 / (1+ Iv ∗1 e3 )−Ra ;
40 Cm=(179 . 7187+335 . 7812 / ( 1+ Iv ∗1 e3 ) ) ∗1e −15;
41 I ( Antemp , Junc ) <+ V( Antemp , Junc ) / Rsp ;
42 I ( Cjunc ) <+ Cm∗ dd t (V( Cjunc ) ) ;
43 I ( Rjunc ) <+ I s ∗ ( l imexp ( (V( Rjunc )− I ( Rjunc ) ∗Rm) / $v t / n )
−1) ; / / a c t i v e r e g i o n modeled e l e c t r i c a l l y as
Shock ley d iode Da
44 i f ( I ( Rjunc ) > I t h )
VCSEL models in Verilog-A 143
45 I a = I ( Rjunc ) ;
46 e l s e
47 I a = I t h +1n ;
48 V( VRjunc ) <+ I a ; / / o u t p u t f o r mon i t o r i n g c u r r e n t
f l ow ing t h r ough a c t i v e r e g i o n
49 L=1 / (4∗pow ( ‘M_PI , 2 ) ∗C∗pow (D, 2 ) ∗ ( Ia − I t h ) ∗1 e21 ) ;
50 R=L∗ (K∗pow (D, 2 ) ∗ ( Ia − I t h ) ∗1 e3+gamma0 ) ∗1 e9 ;
51 V( I n t r , gnd ) <+ 1 / F∗ ( Ia − I t h ) ;
52 V( Popt , gnd ) <+ V( I n t r , gnd )−R∗ I ( I n t r , Popt )−L∗ dd t ( I (
I n t r , Popt ) ) ;
53 I ( I n t r , Popt ) <+ C∗ dd t (V( Popt , gnd ) ) ;
54 V(Rmon , gnd ) <+R;
55 V(Lmon , gnd ) <+L ;
56 end
57
58 endmodule
B.2 Transformed rate equations
Figure B.2: VCSEL model that explains the structure of the proceeding Verilog-A code.
Remark that the test outputs are not included in the drawing.
Listing B.2: Verilog-A code for a non-linear VCSEL model using the transformed rate
equations describing X (t) and Popt (t) based on [71].
1 ‘ i n c l u d e " d i s c i p l i n e s . vams "
2
3 module VCSEL_USC_rate_equat ions ( Cat , An , VRjunc , X, Popt ) ;
4 i n o u t An , Cat ;
5 ou t p u t Popt , VRjunc , X ;
6 e l e c t r i c a l An , Antemp , Cat , Popt , X, Junc , VRjunc ;
7 ground gnd ;
8
9 r e a l Rsp ; / / [Ohm]
10 r e a l Cm; / / [ F ]
144 Chapter B
11 r e a l Rm; / / [Ohm]
12 r e a l Ra ; / / [Ohm]
13 r e a l Cp=84 f ; / / [ F ]
14 r e a l Iv ; / / [A]
15 r e a l I a ; / / [A]
16 r e a l Rtemp=1m; / / i n t e rm e d i a t e r e s i s t a n c e [Ohm]
17 r e a l n =2 . 1065 ; / / h e t e r o j u n c t i o n i d e a l i t y f a c t o r
18 r e a l I s =66p ; / / s a t u r a t i o n c u r r e n t [A]
19 r e a l K=0 . 2536 ; / / K− f a c t o r [ ns ]
20 r e a l t n =97e −3; / / c a r r i e r l i f e t i m e [ ns ]
21 r e a l F=2.74∗1 e3 ; / / new r a t e e q u a t i o n p a r ame t e r [mA/W]
22 r e a l I s e =0.41∗1 e −3; / / new r a t e e q u a t i o n p a r ame t e r [mA]
23 r e a l I t h =0 . 8 3 ; / / t h r e s h o l d c u r r e n t [mA]
24 r e a l B=3976; / / new r a t e e q u a t i o n p a r ame t e r [GHz^2 /mA]
25 r e a l p o l e _ t h e rma l [ 0 : 1 ]={ −1 /1 e−6 , 0} ; / / t h e rma l t ime c o n s t a n t
[ s ]
26
27 pa r ame t e r r e a l t p =3 .3 e−3 from (1 e −3:6 e−3) ; / / ns
28 r e a l t c =K/ ( 4∗ pow ( ‘M_PI , 2 ) )− t p ; / / new r a t e e q u a t i o n p a r ame t e r
29
30 b ranch ( Junc , Cat ) Cjunc , Rjunc ;
31
32 ana l og beg in
33 I (An , Antemp ) <+ V(An , Antemp ) / Rtemp ; / / Rtemp
i n t r o d u c e d t o measure t o t a l VCSEL c u r r e n t
f l ow ing between Anode and Cathode
34 I ( Antemp , Cat ) <+ Cp∗ dd t (V( Antemp , Cat ) ) ;
35 Iv= l a p l a c e _ n p ( I (An , Antemp ) , {1} , p o l e _ t h e rma l ) ; / / low−
pa s s f i l t e r i n g o f VCSEL c u r r e n t t o d e t e rm i n e
t h e rm a l l y s e t t l e d e l e c t r i c a l e l emen t s
36 Rsp =3 .781+16 . 5608 / ( 1+ Iv ∗1 e3 ) ;
37 Ra=n∗ $v t / Iv ;
38 Rm=41 .2379+59 .5092 / (1+ Iv ∗1 e3 )−Ra ;
39 Cm=(179 . 7187+335 . 7812 / ( 1+ Iv ∗1 e3 ) ) ∗1e −15;
40 I ( Antemp , Junc ) <+ V( Antemp , Junc ) / Rsp ;
41 I ( Cjunc ) <+ Cm∗ dd t (V( Cjunc ) ) ;
42 I ( Rjunc ) <+ I s ∗ ( l imexp ( (V( Rjunc )− I ( Rjunc ) ∗Rm) / $v t / n )
−1) ; / / a c t i v e r e g i o n modeled e l e c t r i c a l l y as
Shock ley d iode
43 V( VRjunc ) <+ I ( Rjunc ) ; / / o u t p u t f o r mon i t o r i n g c u r r e n t
f l ow ing t h r ough a c t i v e r e g i o n
44
45 i f ( a n a l y s i s ( " dc " ) ) / / add dc o p e r a t i n g p o i n t : dd t (
s i g n a l ) =0 , add ing a s t a t em e n t h e r e g i v e s i n i t i a l
c o n d i t i o n s f o r t r a n s i e n t b e h a v i o r
46 V( Popt ) <+ ( I ( Rjunc ) ∗1e3− I t h − I s e ) / 2 / F+ s q r t (
pow ( I ( Rjunc ) ∗1e3− I t h − I s e , 2 ) +4∗ I ( Rjunc ) ∗1
e3∗ I s e ) / 2 / F ;
VCSEL models in Verilog-A 145
47 e l s e
48 V( Popt ) <+ −dd t (V( Popt ) ) ∗ t p ∗1e−9+(B∗ t n ∗ I t h ∗ (
V(X) −1)∗ t p +1) / ( 1+ F∗B∗ t p ∗ t c ∗V( Popt ) ) ∗V(
Popt ) + I s e ∗ I t h ∗B∗ t n ∗ t p / F∗V(X) ;
49 V(X) <+ −dd t (V(X) ) ∗ t n ∗1e−9+ I ( Rjunc ) ∗1 e3 / I t h −(F∗B∗ t p
∗ (V(X) −1)∗ t n +F / I t h ) / ( 1+ F∗B∗ t p ∗ t c ∗V( Popt ) ) ∗V( Popt
) ;
50
51 end
52 endmodule

Bibliography
[1] Statista, “Number of monthly active Facebook users world-
wide as of 3rd quarter 2017 (in millions),” 2017.
[Online]. Available: https://www.statista.com/statistics/264810/
number-of-monthly-active-facebook-users-worldwide/
[2] Statistic Brain, “YouTube Company Statistics,” 2016. [Online].
Available: http://www.statisticbrain.com/youtube-statistics/
[3] M. Meeker, “Internet Trends 2017,” 2017. [Online]. Available:
http://www.kpcb.com/internet-trends
[4] Cisco, “The Zettabyte Era: Trends and Anal-
ysis,” 2017. [Online]. Available: https:
//www.cisco.com/c/en/us/solutions/collateral/service-provider/
visual-networking-index-vni/vni-hyperconnectivity-wp.html
[5] Xcentric, “Xcentric,” 2010. [Online]. Available: https:
//xcentric.com/cloud/datacenter/
[6] R. Miller, “Facebook to Build Second Data Center in NC,” 2011.
[Online]. Available: http://www.datacenterknowledge.com/archives/
2011/10/04/facebook-to-build-second-data-center-in-nc
[7] F. Masoud, “12 Mind-Blowing Data Center Facts You Need to Know,”
2016. [Online]. Available: http://www.ciena.com/insights/articles/
Twelve-Mind-blowing-Data-Center-Facts-You-Need-to-Know.html
[8] C. Xie, “Optical interconnects for data centers,” in 2016 IEEE Pho-
tonics Conf., vol. 1, no. c. Waikoloa, HI, USA: IEEE, Oct. 2016, pp.
59–60.
[9] Finisar, “Finisar FTLF8538P4BCL 25.78Gb/s Short-Wavelength
SFP+ Optical Transceiver Product Specification,” 2017. [Online].
Available: https://www.finisar.com/optical-transceivers/ftlf8538p4bcl
148 BIBLIOGRAPHY
[10] J. Petrilla, “100G SR4 & RS(528, 514, 7, 10) FEC,” IEEE802.3bm,
Tech. Rep., 2012. [Online]. Available: http://www.ieee802.org/3/bm/
public/sep12/petrilla_02a_0912_optx.pdf%0A
[11] IEEE P802.3bs 400GbE Task Force, “IEEE P802.3bs Baseline
Summary,” IEEE, Tech. Rep., 2015. [Online]. Available:
http://www.ieee802.org/3/bs/baseline_3bs_0715.pdf
[12] P. P. Baveja et al., “56 Gb/s PAM-4 directly modulated laser for
200G/400G data-center optical links,” in 2017 Opt. Fiber Commun.
Conf. Exhib. Los Angeles, CA, USA: OSA, Mar. 2017, pp. 1–3.
[13] N. Andriolli, P. Velha, M. Chiesa, A. Trifiletti, and G. Contestabile,
“A Directly Modulated Multiwavelength Transmitter Monolithically
Integrated on InP,” IEEE J. Sel. Topics. Quantum Electron., vol. 24,
no. 1, pp. 1–6, Jan. 2018.
[14] A. Abbasi et al., “High Speed Direct Modulation of a Heterogeneously
Integrated InP/SOI DFB Laser,” J. Lightw. Technol., vol. 34, no. 8,
pp. 1683–1687, Apr. 2016.
[15] S. Arai, “Low-threshold membrane DFB and DR lasers,” in 2017
IEEE Photonics Conf. (IPC). Orlando, FL, USA: IEEE, Oct. 2017,
pp. 163-164.
[16] S. Matsuo, T. Fujii, K. Takeda, H. Nishi, K. Hasebe and T. Kakitsuka,
“On-silicon integration of compact and energy-efficient DFB laser
with 40-Gbit/s direct modulation,” in 2015 IEEE 12th Int. Conf. Gr.
IV Photonics (GFP). Vancouver, BC, Canada: IEEE, Aug. 2015,
pp. 23-24.
[17] A. Larsson, M. Geen, J. Gustavsson, E. Haglund, A. Joel, P.Westbergh,
and E. Haglund, “30 GHz bandwidth 850 nm VCSEL with sub-100
fJ/bit energy dissipation at 25-50 Gbit/s,” Electron. Lett., vol. 51,
no. 14, pp. 1096–1098, Jul. 2015.
[18] R. Michalzik, VCSELs: Fundamentals, Technology and Applications
of Vertical-Cavity Surface-Emitting Lasers, ser. Springer Series in
Optical Sciences. Springer Berlin Heidelberg, 2013, vol. 166.
[19] M.-C. Amann and M. Ortsiefer, “The surface emitting semiconductor
laser,” U.S. Patent US7170917B2, 2007.
[20] R. Safaisini, E. Haglund, A. Larsson, J. Gustavsson, E. Haglund, and
P. Westbergh, “High-speed 850 nm VCSELs operating error free up
BIBLIOGRAPHY 149
to 57 Gbit/s,” Electron. Lett., vol. 49, no. 16, pp. 1021–1023, Aug.
2013.
[21] A. Larsson, J. S. Gustavsson, E. Haglund, E. P. Haglund, T. Lengyel,
E. Simpanen, andM. Jahed, “VCSELmodulation capacity: Continued
improvements or physical limits?” in 2017 IEEE Opt. Interconnects
Conf. Santa Fe, NM, USA: IEEE, Jun. 2017, pp. 53–54.
[22] A. Caliman, A. Sirbu, V. Iakovlev, A. Mereuta, P. Wolf, D. Bimberg,
and E. Kapon, “>25 Gbps direct modulation and data transmission
with 1310 nm waveband wafer fused VCSELs,” in 2016 Opt. Fiber
Commun. Conf. Exhib. Anaheim, CA, USA: OSA, Mar. 2016, pp.
1–3.
[23] D. Ellafi, V. Iakovlev, A. Sirbu, G. Suruceanu, A. Mereuta, A. Cal-
iman, and E. Kapon, “Impact of photon lifetime on the high-speed
performance of 1.3-µm wavelength wafer-fused VCSELs,” in 2013
Conf. Lasers Electro-Optics Eur. Int. Quantum Electron. Conf. CLEO
Eur. Munich, Germany: IEEE, May 2013, p. 1.
[24] A. Gatto, A. Boletti, P. Boffi, C. Neumeyr, M. Ortsiefer, E. Ronneberg,
and M. Martinelli, “1.3-µm VCSEL Transmission Performance up to
12.5 Gb/s for Metro Access Networks,” IEEE Photon. Technol. Lett.,
vol. 21, no. 12, pp. 778–780, Jun. 2009.
[25] A. Mereuta et al., “Long wavelength VCSELs made by wafer fusion,”
in 2016 IEEE Photonics Conf. Waikoloa, HI, USA: IEEE, Oct. 2016,
pp. 337–338.
[26] M. Ortsiefer et al., “Long Wavelength VCSELs with Enhanced
Temperature and Modulation Characteristics,” in 2014 Int. Semicond.
Laser Conf. Palma de Mallorca, Spain: IEEE, Sept. 2014, pp. 74–75.
[27] M.-C. Amann and W. Hofmann, “InP-Based Long-Wavelength VC-
SELs and VCSEL Arrays,” IEEE J. Sel. Topics. Quantum Electron.,
vol. 15, no. 3, pp. 861–868, Apr. 2009.
[28] A. Malacarne et al., “Low-Power 1.3-µmVCSEL Transmitter for Data
Center Interconnects and Beyond,” in 43rd Eur. Conf. Opt. Commun.
Gothenburg, Sweden: VDE, Sept. 2017, pp. 1–3.
[29] M. Muller et al., “1550-nm High-Speed Short-Cavity VCSELs,” IEEE
J. Sel. Topics. Quantum Electron., vol. 17, no. 5, pp. 1158–1166, Sept.
2011.
150 BIBLIOGRAPHY
[30] M. Muller, W. Hofmann, G. Bohm, and M.-C. Amann, “Short-Cavity
Long-Wavelength VCSELsWith Modulation Bandwidths in Excess of
15 GHz,” IEEE Photon. Technol. Lett., vol. 21, no. 21, pp. 1615–1617,
Nov. 2009.
[31] S. Spiga, C. Xie, P. Dong, M.-C. Amann, and P. Winzer, “Ultra-high-
bandwidth monolithic VCSEL arrays for high-speed metro networks,”
in 2014 16th Int. Conf. Transparent Opt. Networks. Graz, Austria:
IEEE, Jul. 2014, pp. 1–4.
[32] S. Spiga, D. Schoke, A. Andrejew, M. Müller, G. Boehm, and M. C.
Amann, “Single-mode 1.5- um VCSELs with 22-GHz small-signal
bandwidth,” in 2016 Opt. Fiber Commun. Conf. Exhib. Los Angeles,
CA, USA: OSA, Mar. 2016, pp. 1–3.
[33] W. Hofmann et al., “40 Gbit/s modulation of 1550 nm VCSEL,”
Electron. Lett., vol. 47, no. 4, p. 270, Feb. 2011.
[34] A. Malacarne et al., “Performance Analysis of 40-Gb/s Transmission
Based on Directly Modulated High-Speed 1530-nm VCSEL,” IEEE
Photon. Technol. Lett., vol. 28, no. 16, pp. 1735–1738, Aug. 2016.
[35] R. Rodes et al., “High-Speed 1550 nm VCSEL Data Transmission
Link Employing 25 GBd 4-PAM Modulation and Hard Decision
Forward Error Correction,” J. Lightw. Technol., vol. 31, no. 4, pp.
689–695, Feb. 2013.
[36] P. Westbergh, J. Gustavsson, A. Haglund, M. Skold, A. Joel, and
A. Larsson, “High-Speed, Low-Current-Density 850 nm VCSELs,”
IEEE J. Sel. Topics. Quantum Electron., vol. 15, no. 3, pp. 694–703,
Jun. 2009.
[37] P. Westbergh, J. Gustavsson, B. Kögel, A. Haglund, A. Larsson,
and A. Joel, “Speed enhancement of VCSELs by photon lifetime
reduction,” Electron. Lett., vol. 46, no. 13, p. 938, Jun. 2010.
[38] B. Kögel et al., “High-speed 850 nmVCSELswith 28GHzmodulation
bandwidth operating error-free up to 44Gbit/s,” Electron. Lett., vol. 48,
no. 18, pp. 1145–1147, Aug. 2012.
[39] A. V. Rylyakov et al., “A 40-Gb/s, 850-nm, VCSEL-Based Full
Optical Link,” in Opt. Fiber Commun. Conf. Expo. (OFC/NFOEC),
2012 Nat. Fiber Opt. Eng. Conf. Los Angeles, CA, USA: OSA, Mar.
2012, pp. 1–3.
BIBLIOGRAPHY 151
[40] J. Proesel, B. G. Lee, C. W. Baks, and C. Schow, “35-Gb/s VCSEL-
Based Optical Link using 32-nm SOI CMOS Circuits,” in Opt. Fiber
Commun. Conf. Fiber Opt. Eng. Conf. 2013. Anaheim, CA, USA:
OSA, Mar. 2013, p. OM2H.2.
[41] T. Yazaki et al., “25-Gbpsx4 optical transmitter with adjustable
asymmetric pre-emphasis in 65-nm CMOS,” in Proc. - IEEE Int.
Symp. Circuits Syst. Melbourne VIC, Australia: IEEE, Jun. 2014,
pp. 2692–2695.
[42] Y. Tsunoda, M. Sugawara, H. Oku, S. Ide, and K. Tanaka, “8.9
A 40Gb/s VCSEL over-driving IC with group-delay-tunable pre-
emphasis for optical interconnection,” in 2014 IEEE Int. Solid-State
Circuits Conf. - Dig. Tech. Pap., vol. 57. San Francisco, CA, USA:
IEEE, Feb. 2014, pp. 154–155.
[43] Y. Tsunoda et al., “22.8 A 24-to-35Gb/s x4 VCSEL driver IC with
multi-rate referenceless CDR in 0.13µm SiGe BiCMOS,” in 2015
IEEE Int. Solid-State Circuits Conf. - Dig. Tech. Pap. San Francisco,
CA, USA: IEEE, Feb. 2015, pp. 1–3.
[44] L. Szilagyi, G. Belfiore, R. Henker, and F. Ellinger, “30 Gbit/s 1.7
pJ/bit common-cathode tunable 850-nm-VCSEL driver in 28 nm
digital CMOS,” in 2017 IEEE Opt. Interconnects Conf. Santa Fe,
NM, USA: IEEE, Jun. 2017, pp. 51–52.
[45] W. Soenen, R. Vaernewyck, X. Yin, S. Spiga, and M.-c. Amann, “56
Gb/s PAM-4 Driver IC for Long-Wavelength VCSEL Transmitters,” in
42nd Eur. Conf. Opt. Commun. ECOC, no. 3. Dusseldorf, Germany:
VDE, Sept. 2016, p. Th.1.C.4.
[46] W. Soenen et al., “40Gb/s PAM-4Transmitter IC for Long-Wavelength
VCSEL Links,” IEEE Photon. Technol. Lett., vol. 27, no. 4, pp. 344–
347, Feb. 2015.
[47] W. C. Soenen et al., “A 40-Gb/s 1.5- µmVCSEL link with a low-power
SiGe VCSEL driver and TIA operated at 2.5 V,” in 2017 Opt. Fiber
Commun. Conf. Exhib. Los Angeles, CA, USA: OSA, Mar. 2017,
pp. 1–3.
[48] T. Shibasaki et al., “22.7 4×25.78Gb/s retimer ICs for optical links in
0.13µm SiGe BiCMOS,” in 2015 IEEE Int. Solid-State Circuits Conf.
- Dig. Tech. Pap. San Francisco, CA, USA: IEEE, Feb. 2015, pp.
1–3.
152 BIBLIOGRAPHY
[49] M. Raj, M. Monge, and A. Emami, “A modelling and nonlinear
equalization technique for a 20 Gb/s 0.77pJ/b VCSEL transmitter in
32 nm SOI CMOS,” IEEE J. Solid-State Circuits, vol. 51, no. 8, pp.
1734–1743, Aug. 2016.
[50] D. Kuchta et al., “A 56.1Gb/s NRZ Modulated 850nm VCSEL-Based
Optical Link,” in Opt. Fiber Commun. Conf. Fiber Opt. Eng. Conf.
2013. Anaheim, CA, USA: OSA, Mar. 2013, p. OW1B.5.
[51] ——, “64Gb/s Transmission over 57mMMFusing anNRZModulated
850nm VCSEL,” in Opt. Fiber Commun. Conf. San Francisco, CA,
USA: OSA, Mar. 2014, p. Th3C.2.
[52] D. M. Kuchta et al., “A 71-Gb/s NRZ Modulated 850-nm VCSEL-
Based Optical Link,” IEEE Photon. Technol. Lett., vol. 27, no. 6, pp.
577–580, Mar. 2015.
[53] ——, “Error-free 56 Gb/s NRZ modulation of a 1530-nm VCSEL
link,” J. Lightw. Technol., vol. 34, no. 14, pp. 3275–3282, Jul. 2016.
[54] J. Chen et al., “An Energy Efficient 56 Gbps PAM-4 VCSEL Transmit-
ter Enabled by a 100 Gbps Driver in 0.25 µm InP DHBT Technology,”
J. Lightw. Technol., vol. 34, no. 21, pp. 4954–4964, Nov. 2016.
[55] G. Belfiore, M. Khafaji, R. Henker, and F. Ellinger, “A 50 Gb/s 190
mW Asymmetric 3-Tap FFE VCSEL Driver,” IEEE J. Solid-State
Circuits, pp. 1–8, Jul. 2017.
[56] G. Belfiore, R. Henker, and F. Ellinger, “90 Gbit/s 4-level pulse-
amplitude-modulation vertical-cavity surface-emitting laser driver
integrated circuit in 130 nm SiGe technology,” in 2016 IEEE MTT-S
Lat. Am. Microw. Conf. Puerto Vallarta, Mexico: IEEE, Dec. 2016,
pp. 1–3.
[57] W. Soenen and X. Yin, “Pseudo-balanced driver,” European Patent
Application EP16177056, Jun. 30, 2016.
[58] S. Spiga, D. Schoke, A. Andrejew, G. Boehm, and M.-C. Amann,
“Effect of Cavity Length, Strain, and Mesa Capacitance on 1.5-µm
VCSELs Performance,” J. Lightw. Technol., vol. 35, no. 15, pp.
3130–3141, Aug. 2017.
[59] S. Spiga, J. Missine, and K. S. Kaur, “MIRAGE: D4.2 Flip-chip
adapted 26 Gb/s 4-element VCSEL arrays,” Tech. Rep., Aug. 2013.
BIBLIOGRAPHY 153
[60] S. Spiga, D. Schoke, A. Andrejew, G. Boehm, and M.-C. Amann, “En-
hancing the small-signal bandwidth of single-mode 1.5-µm VCSELs,”
in 2016 IEEE Opt. Interconnects Conf. San Diego, CA, USA: IEEE,
May 2016, pp. 14–15.
[61] A. Bacou, A. Hayat, V. Iakovlev, A. Syrbu, A. Rissons, J. C. Mollier,
and E. Kapon, “Electrical modeling of long-wavelength VCSELs for
intrinsic parameters extraction,” IEEE J. Quantum Electron., vol. 46,
no. 3, pp. 313–322, Mar. 2010.
[62] R. S. Tucker, “High-Speed Modulation of Semiconductor Lasers,”
IEEE Trans. Electron Devices, vol. 32, no. 12, pp. 2572–2584, Dec.
1985.
[63] Ning Hua Zhu et al., “Small-Signal Equivalent-Circuit Model and
Characterization of 1.55µm Buried Tunnel Junction Vertical-Cavity
Surface-Emitting Lasers,” IEEE Trans. Microw. Theory Tech., vol. 58,
no. 5, pp. 1283–1289, May 2010.
[64] J. Gao, “High Frequency Modeling and Parameter Extraction for
Vertical-Cavity Surface Emitting Lasers,” J. Lightw. Technol., vol. 30,
no. 11, pp. 1757–1763, Jun. 2012.
[65] A. Bacou, A. Hayat, A. Rissons, V. Iakovlev, A. Syrbu, J.-C. Mollier,
and E. Kapon, “VCSEL Intrinsic Response Extraction Using T-Matrix
Formalism,” IEEE Photon. Technol. Lett., vol. 21, no. 14, pp. 957–959,
Jul. 2009.
[66] S. Spiga, A. Andrejew, G. Boehm, and M.-C. Amann, “Single-mode
1.5-µm VCSELs with small-signal bandwidth beyond 20 GHz,” in
2016 18th Int. Conf. Transparent Opt. Networks. Trento, Italy: IEEE,
Jul. 2016, pp. 1–4.
[67] H. Li, J. A. Lott, P. Wolf, P. Moser, G. Larisch, and D. Bimberg,
“Temperature-Dependent Impedance Characteristics of Temperature-
Stable High-Speed 980-nm VCSELs,” IEEE Photon. Technol. Lett.,
vol. 27, no. 8, pp. 832–835, Apr. 2015.
[68] J. Gustavsson, J. Vukusic, J. Bengtsson, and A. Larsson, “A com-
prehensive model for the modal dynamics of vertical-cavity surface-
emitting lasers,” IEEE J. Quantum Electron., vol. 38, no. 2, pp.
203–212, Aug. 2002.
[69] S. Spiga et al., “Single-Mode High-Speed 1.5-µmVCSELs,” J. Lightw.
Technol., vol. 35, no. 4, pp. 727–733, Feb. 2017.
154 BIBLIOGRAPHY
[70] L. A. Coldren, S. W. Corzine, and M. L. Masanovic, Diode Lasers
and Photonic Integrated Circuits, 2nd ed. John Wiley & Sons, 2012.
[71] L. Bjerkan, A. Royset, L. Hafskjaer, and D. Myhre, “Measurement of
laser parameters for simulation of high-speed fiberoptic systems,” J.
Lightw. Technol., vol. 14, no. 5, pp. 839–850, May 1996.
[72] Mellanox Technologies, “Mellanox 100Gb / s QSFP28 LR4 Optical
Transceiver,” 2017. [Online]. Available: http://www.mellanox.com/
products/interconnect/ethernet-optical-transceivers.php
[73] S. U. H. Qureshi, “Adaptive Equalization,” Proc. IEEE, vol. 73, no. 9,
pp. 1349–1387, Sept. 1985.
[74] E. Säckinger, Broadband Circuits for Optical Fiber Communication.
John Wiley & Sons, 2005.
[75] M. Bruensteiner et al., “3.3-V CMOS pre-equalization VCSEL trans-
mitter for gigabit multimode fiber links,” IEEE Photon. Technol. Lett.,
vol. 11, no. 10, pp. 1301–1303, Oct. 1999.
[76] W. Soenen, B. Moeneclaey, X. Yin, and J. Bauwelinck,
“ADE assembler flow for rapid design of high-speed low-
power circuits,” in Proc. CDNLive, Cadence User Conf. 2017
EMEA, Munich, Germany, May 2017, pp. 1–2. [Online].
Available: https://www.cadence.com/content/cadence-www/global/
en_US/home/cdnlive/emea-2017/proceedings.html
[77] N. S. Nise, Regeltechniek voor technici, 3rd ed. Sdu Uitgevers, 2005.
[78] L. Zhou, A. Safarian, and P. Heydari, “CMOS wideband analogue
delay stage,” Electron. Lett., vol. 42, no. 21, p. 1213, Oct. 2006.
[79] J. Buckwalter and A. Hajimiri, “An active analog delay and the delay
reference loop,” in 2004 IEE Radio Freq. Integr. Circuits Syst. Dig.
Pap. Forth Worth, TX, USA: IEEE, Jun. 2004, pp. 17–20.
[80] Y.-w. Chang, T.-c. Yan, and C.-n. Kuo, “Wideband Time-Delay
Circuit,” in 2011 6th Eur. Microw. Integr. Circuit Conf. Manchester,
UK: IEEE, Oct. 2011, pp. 454–457.
[81] B. Sedighi and J. Christoph Scheytt, “40 Gb/s VCSEL driver IC with
a new output current and pre-emphasis adjustment method,” in 2012
IEEE/MTT-S Int. Microw. Symp. Dig. Montreal, QC, Canada: IEEE,
Jun. 2012, pp. 1–3.
BIBLIOGRAPHY 155
[82] D. M. Binkley, Tradeoffs and optimization in analog CMOS design.
John Wiley & Sons, 2008.
[83] R. Middlebrook, “Applications notes - The general feedback theorem:
a final solution for feedback systems,” IEEE Microw. Mag., vol. 7,
no. 2, pp. 50–63, Apr. 2006.
[84] ——, “Null double injection and the extra element theorem,” IEEE
Trans. Educ., vol. 32, no. 3, pp. 167–180, Aug. 1989.
[85] K. Ohhata, H. Imamura, Y. Takeshita, K. Yamashita, H. Kanai, and
N. Chujo, “Design of a 4 × 10 Gb/s VCSEL Driver Using Asymmetric
Emphasis Technique in 90-nm CMOS for Optical Interconnection,”
IEEE Trans. Microw. Theory Tech., vol. 58, no. 5, pp. 1107–1115,
May 2010.
[86] J. Rogers, C. Plett, and F. Dai, Integrated circuit design for high-speed
frequency synthesis. Artech House, 2006.
[87] D. Watanabe, A. Ono, and T. Okayasu, “CMOS optical 4-PAM
VCSEL driver with modal-dispersion equalizer for 10Gb/s 500m
MMF transmission,” in 2009 IEEE Int. Solid-State Circuits Conf. -
Dig. Tech. Pap., vol. 22, no. 9. San Francisco, CA, USA: IEEE, Feb.
2009, pp. 106–107,107a.
[88] T. Tatsumi, K. Tanaka, S. Sawada, H. Fujita, and T. Abe, “1.3 µm,
56-Gbit/s EML Module target to 400GbE,” in Opt. Fiber Commun.
Conf. Los Angeles, CA, USA: OSA, Mar. 2012, p. OTh3F.4.
[89] A. Moto and K. Uesaka, “Driver circuit for semiconductor laser diode
driven in differential mode,” U.S. Patent US8121160B2, 2012.
[90] L. Choi and W. Sun, “Differential opto-electronics transmitter,” U.S.
Patent US7227878B1, 2007.
[91] J. Yang, “Driver circuit with output common mode voltage control,”
U.S. Patent US6429700B1, 2002.
[92] B. Moeneclaey, J. Verbrugghe, E. Mentovich, P. Bakopoulos,
J. Bauwelinck, and X. Yin, “A 64 Gb/s PAM-4 Transimpedance
Amplifier for Optical Links,” in Opt. Fiber Commun. Conf. Los
Angeles, CA, USA: OSA, March 2017, p. Tu2D.3.
[93] V. Kozlov and A. Chan Carusone, “Capacitively-coupled CMOS
VCSEL driver circuits,” IEEE J. Solid-State Circuits, vol. 51, no. 9,
pp. 2077–2090, Sept. 2016.
[94] D. M. Kuchta et al., “A 50 Gb/s NRZ modulated 850 nm VCSEL
transmitter operating error free to 90◦C,” J. Lightw. Technol., vol. 33,
no. 4, pp. 802–810, Oct. 2015.
[95] A. Abbasi et al., “Direct and Electroabsorption Modulation of a
III–V-on-Silicon DFB Laser at 56 Gb/s,” IEEE J. Sel. Top. Quantum
Electron., vol. 23, no. 6, pp. 1–7, nov 2017.
[96] J. Verbist et al., “First real-time 100-Gb/s NRZ-OOK transmission
over 2 kmwith a silicon photonic electro-absorption modulator.” Los
Angeles, CA, USA: OSA, Mar. 2017, pp. 1–3.
[97] A. Fatemi, H. Klar, F. Gerfers, and D. Kissinger, “A 40 Gbps Micro-
RingModulator Driver Implemented in a SiGe BiCMOS Technology,”
in 2016 IEEE Compd. Semicond. Integr. Circuit Symp., vol. 2016-
Novem. IEEE, oct 2016, pp. 1–4.
[98] J. Li, G. Li, X. Zheng, K. Raj, A. V. Krishnamoorthy, and J. F.
Buckwalter, “A 25-Gb/s Monolithic Optical Transmitter with Micro-
Ring Modulator in 130-nm SoI CMOS,” IEEE Photonics Technol.
Lett., vol. 25, no. 19, pp. 1901–1903, 2013.
[99] J. Wang, Y. Xuan, C. Lee, B. Niu, L. Liu, G. N. Liu, and M. Qi,
“Low-loss and misalignment-tolerant fiber-to-chip edge coupler based
on double-tip inverse tapers,” in Opt. Fiber Commun. Conf., vol. 7.
Washington, D.C.: OSA, 2016, p. M2I.6.
[100] G. Roelkens, D. Vermeulen, S. Selvaraja, R. Halir, W. Bogaerts,
and D. Van Thourhout, “Grating-Based Optical Fiber Interfaces for
Silicon-on-Insulator Photonic Integrated Circuits,” IEEE J. Sel. Top.
Quantum Electron., vol. 17, no. 3, pp. 571–580, may 2011.
[101] R. Nave, “Fermi Function,” 1998. [Online]. Available: http:
//hyperphysics.phy-astr.gsu.edu/hbase/Solids/Fermi.html
[102] ——, “Bands for Doped Semiconductors,” 1998. [Online]. Available:
http://hyperphysics.phy-astr.gsu.edu/hbase/Solids/dsem.html
[103] Paul R. Gray, Paul J. Hurst, Stephen H. Lewis, and Robert G. Meyer,
Analysis and design of analog integrated circuits, 5th ed. JohnWiley
& Sons, 2009.
[104] S. M. Sze, Physics of Semiconductor Devices, 2nd ed. John Wiley
& Sons, 1981.


