### Technical University of Denmark ### **High-speed Integrated Circuits for electrical/Optical Interfaces** Jespersen, Christoffer Felix; Dittmann, Lars Publication date: 2008 Document Version Publisher's PDF, also known as Version of record Link back to DTU Orbit Citation (APA): Jespersen, C. F., & Dittmann, L. (2008). High-speed Integrated Circuits for electrical/Optical Interfaces. ### DTU Library Technical Information Center of Denmark #### **General rights** Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. - Users may download and print one copy of any publication from the public portal for the purpose of private study or research. - You may not further distribute the material or use it for any profit-making activity or commercial gain - You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. # High-Speed Integrated Circuits for Electrical/Optical Interfaces at 100 Gb/s Christoffer F. Jespersen Communication Technology Department of Photonics Engineering, DTU Fotonik TECHNICAL UNIVERSITY OF DENMARK Kgs. Lyngby, Denmark 2006 High-Speed Integrated Circuits for Electrical/Optical Interfaces at 100 Gb/s Christoffer Felix Jespersen Kgs. Lyngby, Denmark, 2006 Technical University of Denmark Department of Photonics Engineering, DTU Fotonik Communication Technology DK-2800 Kgs. Lyngby, Denmark Phone: +45 4525 6352 ©2006 Christoffer F. Jespersen ### Abstract This thesis is part of the general effort to increase the bandwidth of communication networks. The thesis presents the results of the design of several high-speed electrical circuits for an electrical/optical interface. These circuits have been a contribution to the ESTA project in collaboration with the OptCom project. The aim of the ESTA project was to investigate issues at 100 Gb/s and beyond, such as architecture and components. The OptCom project had a more tangible purpose; to create a 100 Gb/s optical/electrical transceiver demonstrator. The thesis focuses on the design of VCO, LA and CDR circuits at the receiver interface, though VCOs are also found in the transmitter, where a multitude of independent sources have to be synchronized before they are multiplexed. The circuits are based on an InP DHBT process (VIP-2) supplied by Vitesse and made publicly available as MPW. The VIP-2 process represents the avant-garde of InP technology, with $f_t$ and $f_{max}$ well above 300 GHz. Principles of high speed design are presented and described as a useful background before proceeding to circuits. A static divider is used as an example to illustrate many of the design principles. Theory and fundamentals of LC-oscillators, such as oscillator criteria, phase noise and different topologies, are given as background. The theory of PLL circuits is also presented. Guidelines and suggestions for static divider, VCO, LA and CDR design are presented using static divider, $50\text{-}100~\mathrm{GHz}$ VCO and $100\mathrm{Gb/s}$ LA+CDR circuits as examples. Finally, it is concluded that the VIP-2 process is suitable technology for creating circuits for $100~{\rm Gb/s}$ communication networks. **Keywords:** Indium Phosphide (InP), DHBT, VCO, Colpitt, Static Divider, CDR, PLL, Transceiver # Sammenfatning på dansk Transmission af data (bl.a. telefoni) sker i dag ved brug af optiske fiber, som har stor båndbredde og lang rækkevidde. Optiske kommunikationssystemer består af optiske fiber og knudepunkter, som benævnes routere og switche. I knudepunkterne foretages modtagelse og behandling af den indkomne data samt sendning af udadgående data. Behandlingen af data kan ske både optisk og elektrisk. Enkle valg kan foretages optisk, men mere kompleks behandling må ske elektrisk. Det senere kræver en omdannelse af signalet fra det optiske til det elektriske domæne. Ligeledes kan der ske en omdannelse af et signal fra det elektriske til optiske domæne. Projektet omhandler udvikling af elektroniske kredse til grænsefladen mellem det optiske og det elektriske domæne, samt at forhøje båndbredden fra $10\text{-}40~\mathrm{Gb/s}$ (i eksisterende systemer) til $100~\mathrm{Gb/s}$ . Forruden selve den fysiske omdannelse af signalet, mellem lys og strøm, må der tages hensyn til både det optiske signals tab af synkronisering og forholdsvis store båndbredde. Det første skyldes at data transmitteres optisk uden brug af et tilhørende klokkesignal som angiver hvornår forskellige bits i datastrømmen begynder og slutter. Uden denne synkronisering er det vanskeligt at fortolke og genskabe den oprindelige information. Det andet skyldes at optiske signaler kan transmitteres ved større båndbredde end hvad som er fysik muligt eller økonomisk rimeligt at behandle elektronisk. En komplet elektrisk/optisk grænseflade består af mange forskellige kredse. Dette projekt har fokuseret på variable oscillatorer (VCO), som benyttes i flere af kredsene, omformning af frekvens og fase (static divider) samt genskabelse af data og synkronisering (LA og CDR). Genskabelsen er en kompleks proces hvor der foretages en genskabelse af det oprindelige signal i form, amplitude og synkronisering (3R). Den store båndbrede har stillet høje krav til de elektroniske kredse, som er blevet produceret i en avanceret InP proces fra Vitesse (VIP-2). # List of Appended Papers ### Paper A W-Band VCOs in InP DHBT for Electrical/Optical Transceivers. C. Jespersen 7th Topical Workshop on Heterostructure Microelectronics (TWHM 2007), IEEE, 21-24 August 2007. ### Paper B 100 Gb/s CDR in InP DHBT. C. Jespersen Manuscript. # Related papers I Design and test of InP DHBT ICs for a 100 $\mathrm{Gb/s}$ demonstrator system. T. Swahn, J. Hallin and T. Kjellberg. International indium phosphide and related materials conference proceedings, IEEE, 79–84, 7-11 May 2006. - II A 165-Gb/s 4:1 multiplexer in InP DHBT technology. - J. Hallin, T. Kjellberg and T. Swahn. Journal of Solid State Circuits, IEEE, 41:2209-2214, October 2006. - III A 100-Gb/s 1:4 Demultiplexer in InP DHBT technology. - J. Hallin, T. Kjellberg and T. Swahn. Journal of Solid State Circuits, IEEE, 41:2209-2214, October 2006. - IV Flip-Chip mounted 1:4 demultiplexer IC in InP DHBT technology operating up to 100 Gb/s. - C. Kärnfelt, J. Hallin, T. Kjellberg, B. Hansson and T. Swahn. Manuscript. - V 104 Gb/s 2e11-1 and 110 Gb/s 2e9-1 PRBS generator in InP HBT technology. - T. Kjellberg, J. Hallin and T. Swahn. International conference digest of technical papers solid-state circuits, IEEE, 2160-2169, February 6-9 2006. # Contents | A | bstra | act | iii | |----|-------|------------------------------------------------|--------------| | Sa | mm | enfatning på dansk | $\mathbf{v}$ | | Li | st of | Appended Papers | vii | | 1 | Intr | roduction | 1 | | | 1.1 | The future of high speed communication systems | 1 | | | | 1.1.1 The need for 40 & 100 GbE | 2 | | | | 1.1.2 Technical feasibility | 2 | | | 1.2 | Transceiver components | 3 | | | | 1.2.1 Scope of the thesis | 3 | | | | 1.2.2 State of the art circuits | 5 | | | | 1.2.2.1 LA | 5 | | | | 1.2.2.2 CDR | 6 | | | | 1.2.2.3 Multiplexer and demultiplexer | 6 | | | | 1.2.2.4 Static divider | 7 | | | | 1.2.2.5 VCO | 7 | | 2 | Hig | h-speed design | 11 | | | 2.1 | Current Mode Logic | 11 | | | | 2.1.1 ECL operation | 12 | | | | 2.1.2 CML or ECL | 14 | | | 2.2 | Signaling and transmission lines | 16 | | | | 2.2.1 Signal wavelength | 16 | | | | 2.2.2 Differential signalling | 17 | | | | 2.2.3 Termination and reflection | 19 | | | | 2.2.4 Conductor modelling and realisation | 22 | | | 2.3 | $f_t$ and $f_{max}$ | 23 | | | 2.4 | Current sources | 26 | | | | 2.4.1 Resistive current source | 26 | | | | 2.4.2 Current mirror | 27 | | | 2.5 | Layout | 29 | | | | 2.5.1 Power supply & distribution | 30 | xii CONTENTS | 3 | Sta | tic divider | 35 | |---|---------------|----------------------------------------------------------------------------------|-----| | | 3.1 | T flip-flop | 36 | | | 3.2 | D latch | 37 | | | 3.3 | Buffers | 39 | | | 3.4 | Simulation | | | | 3.5 | Measurement | 41 | | 4 | $\mathbf{vc}$ | 0 | 45 | | | 4.1 | Phase noise & jitter | 45 | | | 4.2 | Design flow | 47 | | | 4.3 | Testing | 50 | | | 4.4 | Colpitt VCO circuit #1 (microstrip) | 52 | | | 4.5 | Negative resistance VCO circuit | 56 | | | 4.6 | Colpitt VCO circuit #2 (coplanar waveguide) | 58 | | | 4.7 | VCO conclusion | 61 | | 5 | Pha | se Locked Loops | 65 | | | 5.1 | Phase detector | 66 | | | 5.2 | Low-pass filter | | | | 5.3 | Linear amplifier | | | | 5.4 | VCO | | | | 5.5 | Divider | 70 | | | 5.6 | PLL | 70 | | | | 5.6.1 Locked state | 70 | | | | 5.6.1.1 PLL transfer function | 71 | | | | 5.6.1.2 PLL error transfer function | 73 | | | | 5.6.1.3 PLL error response | | | | | 5.6.2 Tracking and acquisition | 78 | | 6 | Clo | ck and data recovery | 81 | | | 6.1 | Architecture | 84 | | | 6.2 | Receiver interface | 86 | | | 6.3 | Limiting amplifier | | | | 6.4 | Phase detector | | | | | 6.4.1 Hogge-type phase detector (linear) | 91 | | | | 6.4.2 Alexander-type phase detector (non-linear) | 93 | | | | 6.4.3 Double Alexander-type phase detector (non-linear) | 95 | | | | 6.4.4 D flip-flops | 98 | | | | 6.4.5 Logical gates | 102 | | | | 6.4.5.1 XOR gates | 102 | | | | $6.4.5.2 \text{AND/NAND gates} \dots \dots \dots \dots \dots \dots \dots \dots$ | 104 | | | 6.5 | Charge-pump filter | | | | | 6.5.1 Hogge-type phase detector input | | | | | 6.5.2 Double Alexander-type phase detector input | | | | | 6.5.3 Charge-pump filter implementation | | | | 6.6 | Buffers | | | | 6.7 | Linear amplifier | 112 | CONTENTS xiii | | <ul><li>6.8</li><li>6.9</li><li>6.10</li></ul> | 6.8.1<br>6.8.2<br>CDR c<br>6.9.1<br>6.9.2<br>Sugges<br>6.10.1 | nentation of CDR circuits Double Alexander-type CDR circuit Hogge-type CDR circuit ircuit measurements VCO measurement results CDR measurement results tions and improvements Design rule restrictions | . 116<br>. 121<br>. 122<br>. 123<br>. 126<br>. 127<br>. 128 | |--------------|------------------------------------------------|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------| | | | | Circuit improvements | | | Cc | nclu | sions | | 131 | | Ac | knov | vledger | nent | 133 | | Bi | bliog | raphy | | 145 | | A | A.1<br>A.2 | Different<br>Inducti<br>A.2.1<br>A.2.2<br>Relativ | Alculations Intial stage | . 150<br>. 151<br>. 153 | | В | B.1 | HBT in | mulations nput capacitance al range of characteristic impedance for transmission lines (mi- | | | $\mathbf{C}$ | Sche | - | s of select circuits | 159 | | | | | n table for W-band measurements | 163 | | | | | n table 101 W band measurements | | | Ľ | VIP | - 4 | | 165 | xiv CONTENTS # Abbreviations and notations 3R Reshape, Reamplify & Retime AC Alternating Current AGC Automatic-Gain-Control AHDL Analog Hardware Description Language BER Bit Error Rate BJT Bipolar Junction Transistor CDR Clock and Data Recovery CID Consecutive Identical Digits CML Current Mode Logic CMOS Complementary Metal-Oxide-Semiconductor CPW CoPlanar Waveguide DC Direct Current DEMUX DEMUltipleXer DFF D Flip-Flop DFT Discrete Fourier Transformation DHBT Double Heterostructure Bipolar Transistor DUT Device Under Test ECL Emitter-Coupled Logic EF Emitter Follower FEC Forward Error Correction FFT Fast Fourier Transformation GaAs Gallium Arsenide GbE Gigabit Ethernet GSG Ground-Signal-Ground HB Harmonic Balance HBT Heterostructure Bipolar Transistor HEMT High Electron Mobility Transistor HSE Higher Speed Ethernet HSSG Higher Speed Study Group IC Integrated Circuit IEEE Institute of Electrical and Electronics Engineers, inc. $\begin{array}{ll} \text{InP} & \text{Indium Phosphide} \\ \text{I/O} & \text{Input/Output} \end{array}$ ISI Inter Symbol Interference LA Limiting Amplifier xvi CONTENTS LSI Large Scale Integration LO Local Oscillator M# Metal layer # MAC Media Access Control MC2 ??? MIM Metal-Insulator-Metal MMF MultiMode Fiber MPW Multi-Project Wafer MSI Medium-Scale Integration MUX MUltipleXer NRZ Non-Return-to-Zero encoding OM3 Optimized Multimode fiber type 3 OTN Optical Transport Network PD Phase Detector PLL Phase Locked Loop PLS Physical Layer Signaling PRBS Pseudo-Random Binary Sequence RMS Root Mean Square RLGC Resistance, inductance, conductance, Capacitance RZ Return-to-Zero encoding SiGe Silicon-Germanium SMF Single Mode Fiber SNR Signal to Noise Ratio SSI Small-Scale Integration TAS TransAdmittance Stage TFF T Flip-Flop TIA TransImpedance Amplifier TIS TransImpedance Stage TTL Transistor-Transistor Logic TWA Travelling Wave Amplifier VCO Voltage Controlled Oscillator VDO Video On Demand # Chapter 1 # Introduction # 1.1 The future of high speed communication systems<sup>1</sup> IEEE established the 802.3 Higher Speed Study Group (HSSG) to develop a road map ahead of 10 GbE. The HSSG was later transformed into the P802.3ba 40 Gb/s and 100 Gb/s Ethernet Task Force, once the preliminary study was complete. The objective of the Task Force is to amend the IEEE 802.3 standard to encompass increased data rates while preserving as much of the current IEEE 802.3 standard as possible. The target completion date for the amendment is June 2010. The amendment will support MAC data rates of 40 & 100 Gb/s and provide Physical Layer specifications which support 40 & 100 Gb/s operation over various media, as summarised in table 1.1. It must also provide the appropriate support for OTN. Amending the existing standard will maintain compatibility with the installed base of IEEE 802.3 interfaces, previous investments in research and development as well as principles of network operation and management. The amendment will provide for the interconnection of equipment satisfying the distance requirements of the intended applications. $<sup>^1</sup>Information$ relating to the progress of the 40 & 100 GbE amendment to IEEE 802.3 can be found on the IEEE P802.3ba 40 Gb/s and 100 Gb/s Ethernet Task Force homepage: http://www.ieee802.org/3/ba/ | Medium | 40 GbE | 100 GbE | |-------------------------------------|--------|---------| | Backplane ≥1 m | | | | Cu cable $\geq 10 \text{ m}$ | | | | $OM3 \text{ MMF} \ge 100 \text{ m}$ | | | | $SMF \ge 10 \text{ Km}$ | | | | $SMF \ge 40 \text{ Km}$ | | | Table 1.1: Physical layer specifications to be defined by the HSE Task Force. #### 1.1.1 The need for 40 & 100 GbE The demand is driven by applications that have been demonstrated to require bandwidth beyond the existing capabilities, as defined by the 10 GbE standard. These include data centers, Internet exchanges, high performance computing and VOD. Bandwidth requirements for computing and core networking applications are growing at different rates, which necessitates the definition of two distinct data rates for the next generation of Ethernet networks in order to address these applications in a cost effective manner: - Servers, high performance computing clusters, blade servers, storage area networks and network attached storage all currently make use of GbE and 10 GbE. I/O bandwidth projections for server and computing applications indicate that there will be a significant market potential for a 40 Gb/s Ethernet interface. - Core networking applications have demonstrated the need for bandwidth beyond existing capabilities and the projected bandwidth requirements for computing applications. Switching, routing, and aggregation in data centers, Internet exchanges and service provider peering points, and high bandwidth applications, such as video on demand and high performance computing environments, have demonstrated the need for a 100 Gb/s Ethernet interface. #### 1.1.2 Technical feasibility The principle of scaling the IEEE 802.3 MAC to higher speeds has been well established by previous work within IEEE 802.3, such as GbE & 10 GbE. The principle of building bridging equipment which performs rate adaptation between IEEE 802.3 networks operating at different speeds has been amply demonstrated by the broad set of product offerings that bridge between 10, 100, 1000, and 10000 Mb/s. Systems with an aggregate bandwidth of greater than or equal to 100 Gb/s have been demonstrated and deployed in operational networks. The IEEE 802.3 amendment will build on the array of Ethernet component and system design experience, and the broad knowledge base of Ethernet network operation: - The experience gained in the development and deployment of 10 Gb/s technology is applicable to the development of specifications for components at higher speeds. For example, parallel transmission techniques allow reuse of 10 Gb/s technology and testing. - Component vendors have presented data on the feasibility of the necessary components for higher speed solutions. Proposals, which either leverage existing technologies or employ new technologies, have been provided. The reliability of Ethernet components and systems can be projected in the target environments with a high degree of confidence. Presentations demonstrating this have been provided by participants in the 802.3 HSSG. ### 1.2 Transceiver components A glance at the block diagrams of various communication systems reveal that all systems, regardless of transmission medium, are made from a combination of just a few basic building blocks. The VCO, static divider, limiting amplifier and the CDR circuits are among this handful of standard components. The VCO is an oscillator used for synchronization of data streams or band selection. Synchronization is required for both CDR and MUX<sup>2</sup> circuits. The static divider creates a signal with a clock frequency equal to a fraction of the frequency of a reference clock and may also generate a quadrature half-rate<sup>3</sup> clock, which is often required by the decision circuits found in MUX, DEMUX and CDR circuits. MUX and DEMUX circuits require a fractional clock signal for each level of multiplexing/demultiplexing, whereas the clocking requirements for the CDR circuit depends on the choice of architecture. A limiting amplifier is a non-linear amplifier that amplifies an input signal to a fixed voltage level, assuming that the input signal is not too weak. The limiting amplifier restores the digital property of a signal that has suffered shape and amplitude distortion. The CDR circuit performs retiming or even complete 3R<sup>4</sup> if it is combined with a limiting amplifier. #### 1.2.1 Scope of the thesis The circuits presented in this thesis are intended as building blocks for a 100 Gb/s serial optical/electrical transceiver. The transceiver performs a conversion between the optical domain and the electrical domain. The transceiver is placed between the optical network and the electrical switch fabric or is used to insert data into an optical switch fabric. The CDR function is part of the receiver path. A limiting amplifier is placed at the ingress of the CDR circuit because the input data signal has a very low (voltage) amplitude at this point. The VCO and static divider circuits are components of the CDR circuits. The static divider is also of interest in the receiver DEMUX and in the transmitter MUX, where it generates the clock frequencies required for multi-stage multiplexing and demultiplexing. A block diagram of a transceiver architecture is shown in fig. 1.1. Figure 1.1: Block diagram of an electrical/optical transceiver. <sup>&</sup>lt;sup>2</sup>If the incoming data channels are independently timed. <sup>&</sup>lt;sup>3</sup>Typically half-rate, but could be $1:2^n, n \in \mathbb{N}^*$ by iterating the process. <sup>&</sup>lt;sup>4</sup>Reshape, Reamplify and Retime, i.e. complete regeneration of the signal. The CDR function is required to recover the synchronization and data of the incoming stream. The data signal is distorted during optical transmission, and the optical network carries no separate synchronization signal (clock) to aid the interpretation of the data, see fig. 1.2. The signal shape and amplitude can be regenerated Figure 1.2: PRBS prior to and after transmission. by the LA, but the timing of the data is lost at the point of transmission. The timing must be recovered for the data to be correctly interpreted. The CDR circuit utilizes the incoming data signal to generate a clock signal that is phase-locked to the incoming data carrier, though the frequencies are not necessarily the same<sup>5</sup>. The regenerated clock signal is subsequently used to capture the data using a decision circuit (i.e. D flip-flops). The regenerated clock signal is propagated alongside the regenerated data to provide the essential downstream synchronization. Typically, the subsequent step is a demultiplexing process to (further) lower the signaling bandwidth to a more manageable, and less costly, level. The VCO is a voltage controlled oscillator and acts as the heart of the PLL based CDR circuit. The VCO generates a clock signal and the phase of the clock signal is adjusted to match the phase of the incoming data carrier. A VCO circuit may also be used in the MUX where several data channels from independent sources may have to be buffered and synchronized before being merged. The choice of VCO center frequency depends on the system architecture and data bandwidth. The static divider generates a (possibly quadrature) clock signal at 1:2 the frequency of the incoming clock signal. A CDR circuit has an internal VCO whereas the MUX and DEMUX circuits rely on external clocks. The quadrature signal is useful for demultiplexing and for some types of phase detectors<sup>6</sup>. Several dividers may also be daisy-chained to provide the clock signals for consecutive steps of multiplexing or demultiplexing. Design principles, simulation and measurement of the various circuits will be discussed within individual chapters, while the process description will be confined to an appendix. <sup>&</sup>lt;sup>5</sup>The VCO has a center frequency closely corresponding to a fraction of the bitrate, $1:2^n, n \in \mathbb{N}_0$ . The required center frequency depends on the choice of architecture. <sup>&</sup>lt;sup>6</sup>A CDR circuit component. #### 1.2.2 State of the art circuits Equipment is commercially available for 10 and 40 Gb/s communication systems and circuits operating in this range<sup>7</sup> have been published in recent years in various III-V technologies, such as InP, GaAs and SiGe. It is even possible to make the 10 Gb/s serial interface circuits in CMOS, and higher bandwidths are being explored using the same technology. The road ahead is as yet resolved, with several possibilities currently being standardised and approved by the IEEE P802.3ba HSE Task Force. It is likely that these original standards will be provisional, i.e. they will be based on practical realisations and gradually fade away as technology develops and more mature standards begin to emerge. The telecommunication industry would prefer a standard of 2, 3 or 4×40 Gb/s to facilitate multiplexing and demultiplexing of existing 2.5/10/40 Gb/s channels, while the Ethernet standard has progressed in orders of magnitude and thus could be expected to aim at 100 Gb/s. The developing 802.3ba standards focus on 100 Gb/s Ethernet in various forms. None of these currently aim at employing 100 Gb/s serial interfaces but instead utilise multiplexing schemes such as $2 \times 50$ , $4 \times 25$ or $10 \times 10$ Gb/s. However, a 100 Gb/s serial interface is within reach of current III-V IC technology. Several groups have been working to provide the components and system verification of such interfaces. The experimental field has traditionally been dominated by NTT and various small groups working at different companies and universities. Connecting circuits within a transceiver is no mean feat at $40~\mathrm{Gb/s}$ and beyond. The obvious solution is to achieve a high degree of integration, so that several circuits can be merged into one or placed adjacently within customised flip-chip packages. The road map for $40~\mathrm{Gb/s}$ has been to begin with simple circuits and progressively add complexity. The same path will be taken for $100~\mathrm{Gb/s}$ . Several of the circuits mentioned in the following will thus have varying degrees of complexity. #### 1.2.2.1 LA The photo detector of the receiver generates a very weak current signal that has to be converted into a voltage signal and amplified. This is performed by a TIA/TWA circuit. The resulting signal is still weak and requires regeneration of shape and amplitude to regain its digital characteristics. This is the function of the LA circuit that provides the appropriate input data signal to a CDR circuit. Both the photo detector and the CDR circuit can be manufactured in InP technology but the processes for the two types of circuits are not compatible. This makes it (as yet) impossible to place both circuits on the same die. The solution has been to integrate the photo detector and the TIA/TWA as one package. Integrating the LA with the TIA/TWA would not be a good idea because the imperfect interface between the LA and the CDR circuit would then require another LA at the ingress of the CDR circuit. Thus the solution has been to integrate the photo detector and the TIA/TWA as a package and the LA and the CDR circuit on a single die. Eventually, the TIA/TWA, LA, CDR circuit etc. will be integrated on a single die and share a package with the TIA/TWA. $<sup>^7\</sup>mathrm{The~40~Gb/s}$ data stream is encoded to provide error correction and line coding. The typical bit-rates are 43 or 47 Gb/s. | Technology | InP | SiGe | CMOS | |------------|-------------|----------------|------------------------------------------| | 1999 | | [1, 20] | | | 2000 | | [21, 22, 23] | | | 2001 | [5] | [3, 24, 25] | [12] | | 2002 | [2, 6, 26] | [27] | [28] | | 2003 | [8, 29, 30] | [31, 9, 4, 32] | [33, 34, 35, 36, 37, 38, 39, 40, 41, 42] | | 2004 | [7] | [43] | [44, 45, 46, 47, 48, 49] | | 2005 | [10, 50] | [51, 52, 53] | [54, 55, 56, 57] | | 2006 | [11] | [58] | [59, 60, 61, 13, 62, 63] | | 2007 | | | [14, 15, 16, 17, 18, 64, 65, 19] | Table 1.2: Publications on CDR circuits at 10 Gb/s and beyond in the period 1999-2007. The performance of the LA must then be viewed in the context of the CDR circuit. It's most revealing characteristic is the input sensitivity it gives the CDR circuit. The input sensitivity is not a fail or pass parameter but a trade-off between signal input power and BER in a particular environment. The HSSG objectives for 40 & 100 GbE aim at supporting a BER that is better than or equal to $10^{-10}$ or $10^{-12}$ at the MAC/PLS service interface. #### 1.2.2.2 CDR The CDR circuit is the most complex of the transceiver circuits. Siemens was first out with a 40 Gb/s CDR circuit [1]. Lucent<sup>8</sup> was next with a 40 Gb/s CDR circuit that also included a limiting amplifier and a 1:2 demultiplexer [2, 3, 4]. A 40 Gb/s MUX was later developed to make an integration test [4]. NTT has a long history in this field, including complex, low-power 10 Gb/s CDR circuits with integrated 1:4 DEMUX [5, 6]. A 40 Gb/s CDR circuit was then presented two years later [7], when Inphi Corporation and Sierra Monolithics, Inc. published similar results [8, 9]. The most recent development comes from the Fraunhofer Institute (Freiburg), where an 80 Gb/s CDR circuit with integrated 1:2 DEMUX [10, 11] has been developed. A 10 Gb/s CDR circuit emerged early on the CMOS front [12] and recent results have pushed the bandwidth beyond that limit [13, 14, 15, 16, 17, 18, 19]. The achievement demonstrated [14] is particularly impressive. All the published results on (electrical) CDR circuits at 10 Gb/s and beyond have been assembled in table 1.2 to provide an overview. The results have also been condensed into graphical form to illustrate the recent trends. The results have been distributed according to bandwidth and type of IC technology and presented in fig. 1.3. #### 1.2.2.3 Multiplexer and demultiplexer Multiplexers and demultiplexers are not a particular topic of this thesis, but they are based on the same D latches that are found in the phase detectors of CDR circuits. NTT has steadily improved their InP process, resulting in matching performance in <sup>&</sup>lt;sup>8</sup>Several of the authors switched to CoreOptics during this period. Figure 1.3: Published results for CDR circuits with a bitrate $\geq 10~\mathrm{Gb/s}$ in the period 1999-2007 distributed according to bitrate and technology. their MUX and DEMUX circuits. 10 Gb/s DEMUX circuits were the first to emerge [5, 6] and were later followed by 50, 80 and 100 Gb/s MUX and DEMUX circuits [66, 67, 68] based on similar high performance decision circuits [69, 70]. Several other groups have also been involved [9, 32, 4, 11, 71, 18]. CTH has recently published a 100 Gb/s packaged demultiplexer [72, 73] as well as a 165 Gb/s multiplexer [74] for a 100 Gb/s demonstrator system [75, 76]. Some CDR circuits have also been fitted with on-chip multi-channel demultiplexers, e.g. [8, 32, 11, 71] #### 1.2.2.4 Static divider The performance of static dividers are often used as a benchmark for IC technology. Static dividers are digital circuits and are usually based on D flip-flops, thus providing a realistic view of the performance a particular process may offer. There are two known results using VIP-1 and VIP-2 [77, 78], and a few in other InP processes, e.g. [79]. [77, 78] operate in the 40-80 Gb/s range. #### 1.2.2.5 VCO VCO circuits are a bit tricky to compare for a VCO has many different parameters. One or more of these parameters may be optimized, but often at the expense of other parameters. An obvious reference is [78], because it has been manufactured in VIP-2. Fraunhofer has the current bandwidth record for CDR circuits, and have published two (almost identical) VCO circuits that are explicitly intended for CDR circuits [80, 81]. There are also a few InP-based VCO circuits in the W-band (75-110 GHz) and beyond. These results have been placed in a table with a few key parameters for easy comparison, see table 1.3. The phase noise and output power are the best values achieved within the tuning range. Higher frequencies have been achieved using other III-V technologies. These results have also been added to the table. The output power, $P_{out}$ , is for differential output signals. L @ 1 MHz offset (dBc/Hz) Maximum $f_{osc}$ (GHz) Reference Year Technology $P_{out}$ (dBm) [82] 1999 InP HBT 108 -88 0.92 [83] 1999 InGaP/InGaAs -95 -3.4 104 [83] 1999 InGaP/InGaAs 134 -72 -10.4 -85 [84] 2000 InP HBT 100 -2 SiGe HBT -85 [85]2003 150 3 [86] 2003 GaInP/GaAs HBT 77 -92 -2.3 [87] 2003 SiGe 77 -95 14.3 [88] 2003 SiGe 79.6 -94 11 $N/A^9$ SiGe 99 [89] 2004 14.3 2004 InP HBT 75 -97 [80] 11 $N/A^{10}$ 2004 InP DHBT 84 -71 [78] [90] 2004 InP HBT 80 -118 [81] 2005 InP HBT 89 -102 8 Table 1.3: 1999-2007 period. Publications on III-V VCO circuits in the W-band and beyond in ## Chapter 2 # High-speed design The O/E-interface consists of high-speed building blocks. The designer has to make several choices, such as choosing an appropriate process technology and architecture for the task, as well as employing various design techniques to achieve the specifications. Some of the design techniques will be described in this chapter. ### 2.1 Current Mode Logic The CDR circuit consists of a mix of digital and analog components. The digital components are based on the current mode logic (CML) family, also known as emitter-coupled logic (ECL) [91]. The chief characteristic of CML is that the transistors are always in the active region and can thus change state very rapidly, allowing CML circuits to operate at a very high speed. The main disadvantage is that the circuits are drawing a constant current, independent of the state, resulting in high current densities and power consumption. Most of the power is turned into heat, which has an adverse effect on circuit performance. Heat must be both restricted and effectively dissipated, for excess heat may impede, damage or even destroy a circuit if left unchecked. Current densities must also be limited to avoid electro-migration<sup>1</sup>. CML gates are based on differential stages, as shown in fig. 2.1. The differential stage can be fitted with a pair of emitter followers (also known as a common collector configuration). This combination is denoted ECL. The emitter followers provide a bias configuration supplying a suitable, constant voltage level to the differential stage. The constant voltage level is at the midrange of the low and high logic levels to the differential stage. A differential input signal will overlay the constant bias voltage and pass almost unchanged (with respect to voltage amplitude) across the emitter followers. The emitter followers perform level shifting as well as decoupling (impedance transformation) of the signal. The emitter followers act as buffers and can reduce <sup>&</sup>lt;sup>1</sup>Electro-migration is the transport of material caused by the gradual movement of ions in a conductor due to the momentum transfer between conducting electrons and diffusing metal atoms. The gradual process will cause erosion to conductors, eventually resulting in circuit failure. Fabs specify design rules to avoid electro-migration, generally by applying limits for current densities (given as $I/\mu m$ for a particular layer). Figure 2.1: CML principle using a differential stage and emitter followers (ECL). the input load. The transistors of the emitter followers can be smaller than the transistors of the differential stage. This reduces the parasitic capacitance, $C_{BC}$ , of the input and results in improved bandwidth. The effective current gain of transistors decreases with frequency, reducing the decoupling. Thus two (or even three) cascaded emitter followers are often required, sometimes denoted $E^2CL$ . However, the improved decoupling comes at some cost to signal amplitude, primarily because of parasitic voltage losses over $r_E$ in the transistor. The emitter followers also provide gain peaking near the the upper frequency limit, extending the bandwidth. ECL offers a number of advantages, but it also requires more area, current & power relative to CML. ECL is used in critical circuits where speed is paramount, whereas CML is used elsewhere. ### 2.1.1 ECL operation The principle of the differential stage operation can easily be demonstrated using the Ebers-Moll model [92]. A differential stage, with currents and voltages indicated, is shown in fig. 2.2. The three current equations can be stated as: $$I_A = \frac{I_S}{\beta + 1} \left[ \exp\left(\frac{V_A - V_{CC}}{V_T}\right) - 1 \right] \tag{2.1}$$ $$I_B = \frac{I_S}{\beta + 1} \left[ \exp\left(\frac{V_B - V_{CC}}{V_T}\right) - 1 \right]$$ (2.2) $$I_{CC} = (\beta + 1)(I_A + I_B)$$ (2.3) Figure 2.2: Differential stage marked with currents and voltages. The differential output voltage is: $$V_{out} = V_{AA} - V_{BB} = R(I_{BB} - I_{AA}) = R\beta(I_B - I_A)$$ (2.4) Fiddling about<sup>2</sup> for a few minutes eventually yields: $$V_{out} = \frac{R\beta \left(I_{CC} + 2I_S\right)}{(\beta + 1)} \times \tanh\left(\frac{-V_{in}}{V_T}\right) \approx RI_{CC} \times \tanh\left(\frac{-V_{in}}{V_T}\right)$$ (2.5) where: $$V_{in} = V_A - V_B \tag{2.6}$$ The details are shown in section A.1. The resulting equation has the form $V_{out} = f(V_{in})$ . The function is shown in fig. 2.3 for a set of typical values. $V_{out}$ rises steeply with $V_{in}$ around origin for realistic values of R. A maximum differential voltage swing of $\sim RI_{CC}$ can be achieved. For a cascade of identical differential stages, the voltage swing can be found as $V_{out} = -V_{in}$ in the curve, and is typically $<sup>^2</sup>$ See e.g. Using English.com. Figure 2.3: Differential stage output voltage as a function of input voltage. about 95% of the full voltage swing. The differential stage operates as a non-linear amplifier, reshaping and reamplifying a differential signal that has become distorted or weakened, providing the basis for digital logic. #### 2.1.2 CML or ECL The choice between CML and ECL depends on a number of factors. The emitter followers require additional power without contributing to the logical function. The emitter followers also serve the purpose of buffering the collector load resistor of the CML thereby isolating the load from the switching time constant. It is this property that is usually given for arguing that ECL is faster than CML. This is probably true for an SSI circuit operating in the transistor limited speed range. The argument becomes less convincing for MSI/LSI circuits where the current levels involved produce RC time constant limited gate speeds. The increase in gate propagation delay caused by the emitter followers is approximately equal to the transit time of the transistor. This can be justified if the extra propagation delay is more than compensated by a reduction in the RC time constant that the external load places on the differential pair. A capacitively loaded CML gate and switching waveforms are shown in figs. 2.4 & 2.5. The differential pair itself is given a zero delay to simplify the comparison. The CML output voltage exhibits a typical RC time constant. The voltage swing, $V_S$ , is defined as $V_H - V_L$ . The propagation delay, $t_p$ , is defined as the time when the output voltage crosses the threshold at the midpoint between the logical levels, $(V_H - V_L)/2$ : $$t_{p,CML} = RC_L \ln(2) = \frac{V_S C_L \ln(2)}{I_D} \approx \frac{V_S C_L}{1.4 \times I_D}$$ (2.7) Figure 2.4: Loaded CML gate. Figure 2.5: Waveform for CML gate switching. The worst case for the ECL gate is the falling edge. It is limited by the amount of current available to drive the load from the pull-down current source, $I_L$ . The loaded circuit and corresponding waveform are shown in figs. 2.6 & 2.7. The propagation Figure 2.6: Loaded ECL gate. Figure 2.7: Waveform for ECL gate switching. delay is: $$t_{p,ECL} = \frac{V_S C_L}{2 \times I_L} \tag{2.8}$$ The next step is to determine when the propagation delay for both CML and ECL are equal: $$t_{p,ECL} - t_{p,CML} = 0 \Rightarrow \tag{2.9}$$ | $\frac{P_{ECL} - P_{CML}}{P_{ECL}}$ | $I_D = I_L$ | $I_D = 2I_L$ | |-------------------------------------|-------------|--------------| | Single-ended | 30% | 7% | | Differential | 50% | 30% | Table 2.1: Power savings of CML relative to ECL, for stages with equal performance. $$\frac{V_S C_L}{2 \times I_L} - \frac{V_S C_L \ln(2)}{I_D} = 0 \Rightarrow \tag{2.10}$$ $$I_D = 2\ln(2) \times I_L \approx 1.4 \times I_L \tag{2.11}$$ The current sources of the differential stage and the emitter followers are typically of similar size. The CML gate would then use about 30% less power than a ECL gate with similar performance. The previous examples are all single-ended. One of the features of ECL/CML is the complementary logic output function. Using a second emitter follower on the complementary output would further improve the power consumption advantage of the CML gate to about 50%. Operating the differential stage at optimum current density would require twice the current through the core relative to the emitter followers, given similar size transistors. The power advantage of CML would then be reduced to 7% and 30% for single ended and differential output respectively. The results are shown in table 2.1. The conclusion, so far, is that CML has an inherent power advantage over ECL. The differential voltage swing, $V_s$ , is $\approx RI_{CC}$ . This is a straightforward formula, but not any set of values for R and $I_{CC}$ is desirable. A large voltage swing will make the signal more immune to noise, but requires a large R and/or substantial $I_{CC}$ . A large R increases switching time. The output load, $C_{load}$ , is discharged through a resistance, R, creating a time constant of $\tau_{fall} = RC_{load}$ . The switching time is thus proportional to R (and $C_{load}$ ). $I_{CC}$ is equally problematic. The circuit power consumption, and thus heat generation, is $P = -V_{SS}I_{CC}$ . An improved voltage swing is therefore bought at the cost of switching time and/or power consumption. CML has the additional feature of having a nearly constant current consumption. This reduces the noise in the supply voltages considerably. Some current spikes occur during switching due to the limitations of the less than ideal current source. The problem can be mitigated by having adding on-chip decoupling capacitance to the supply grid and by the design of the current source, see section 2.4. ### 2.2 Signaling and transmission lines ### 2.2.1 Signal wavelength A chip consists of interconnected circuits. The interconnections are electrical signals propagating through metal conductors. Some circuits are placed together to form functional blocks, such as a VCO or an amplifier. Interconnects can be categorised based roughly on length; short interconnects within functional blocks and long interconnects between functional blocks. The reference to length is a bit loose, for it is related to both the signal bandwidth and the physical length of the interconnect. An interconnect is considered short if $l \ll \lambda/4$ , i.e. the signal can safely be assumed to be uniform over the length of the conductor. The wave nature of a signal can be ignored for short interconnects but must otherwise be taken into consideration. The signal wavelength is related to both the signal bandwidth<sup>3</sup> and the signal propagation velocity of a particular conductor: $$l \ll \lambda/4 = \frac{v}{4f} \tag{2.12}$$ The velocity is related to physical properties of the conductor and its surrounding materials, as well as geometry: $$v = \frac{1}{\sqrt{\epsilon \mu}} = \frac{c}{\sqrt{\epsilon_r \mu_r}} \tag{2.13}$$ The relative permeability, $\mu_r$ , is close to unity (for most materials), whereas the permittivity, $\epsilon_r$ , is effectively close to 4.0 for the VIP-2 process<sup>4</sup>. This yields $v = c/\sqrt{4} = 1.5 \times 10^8 m/s$ . The critical length, $l_c$ , in VIP-2 can thus be easily found for an interconnect, given the frequency. $l_c = 375 \ \mu m$ for a 100 GHz clock signal. The designer must be mindful of the length of interconnections and take the appropriate action where necessary. ### 2.2.2 Differential signalling Differential signalling on balanced lines has several advantages over single-ended signalling [93]: - A differential signal requires no common reference voltage, unlike a single-ended signal. This facilitates signalling across different logic families (ECL, TTL etc.), technologies (e.g. SiGe), circuits and chips. - The voltage swing can be made lower, while maintaining the integrity of the signal. - Symmetrical pulse edges compensates for transients. The lack of a common reference voltage eliminates the problem of jitter in the reference voltage affecting the interpretation of the signal. Noise is also improved by the relative increase in amplitude and the rejection of common mode noise. An example is shown in fig. 2.8. The single ended signal is interpreted relative to a reference voltage. A differentiator is used to discern the binary value. Both noise and (more specifically) common mode noise may result in erroneous interpretation if the noise amplitude crosses the threshold, as shown by the two errors slipping through. <sup>&</sup>lt;sup>3</sup>It is actually the spectral component with the highest frequency that dictates the minimum wavelength. The spectral component with the highest frequency and the signal bandwidth are the same for a baseband data signal. Not all signals employ the baseband, e.g. clock signals occupy extremely narrow bandwidths while their respective spectrums are located around much higher center frequencies. <sup>&</sup>lt;sup>4</sup>The effective $\epsilon_r$ depends on the configuration. The conductor is placed differently, relative to the surrounding layers, for each of the four metal layers in the VIP-2 process. Metal 4 is the most useful signal conductor due to its greater distance to substrate. Figure 2.8: The effects of noise in single-ended and differential signaling. The differential signal is complementary, and the subtractor thus yields twice the amplitude. The improved noise margin reduces the likelihood of errors. The differential line is subjected to the same level of noise as the single-ended line, but the noise is now significantly below the threshold and does not generate any errors. The two lines are closely spaced to minimize the effects of interference. The intention is to ensure that any noise source has equal influence on both of the balanced lines. The noise would then become common mode. Common mode noise is effectively eliminated at the differentiator, as shown by the example, whether it originates in the source or along the lines. The example shows how the effective signal amplitude doubles with differential signaling. This is also true for the slope of the pulse edges, becoming twice as steep. This allows for higher data rates with improved eye diagrams. Slope ( $\propto 1/RC_{load}$ ), Differential voltage swing ( $\sim RI_{CC}$ ) and power consumption ( $-V_{SS}I_{CC}$ )<sup>5</sup> are somewhat interchangeable. Differential signalling also has several disadvantages. Twice as many conductors are used, requiring additional area, longer conductors (because of the larger area occupied by the conductors) and possibly creating routing problems. The length of the conductors also have to be matched to ensure synchronous arrival of the signal at the receiver. Furthermore, the conductors must be closely spaced to reap the reward of common mode suppression, particularly if significant noise sources are found on-chip. All of the circuits presented in this thesis employ differential signalling for all signals, both analog and digital, with the exception of DC biasing signals. $<sup>^5</sup>$ Reducing the voltage swing across the output resistors would reduce the $-V_{SS}$ required for driving the circuit. The circuit power consumption is $P=-V_{SS}I_{SS}$ . A typical $V_{SS}$ is -4 V and a 50% reduction of a 400mV voltage swing (200 mV) would save 5% of the power. Changes in $I_{CC}$ for a particular circuit has a much greater impact. A 50% reduction in $I_{CC}$ would yield a 50% reduction in both the voltage swing and in P. #### 2.2.3 Termination and reflection Signals are transmitted between circuits. For long transmission lines, the wave nature of the signals can not be ignored. A simple circuit containing a source and a load is shown in fig. 2.9. The difference between the impedance of the source, $Z_S$ , and the Figure 2.9: Simple circuit configuration showing measurement location of reflection coefficient. impedance of the load, $Z_L$ , will cause a discontinuity. This will result in part of the wave being reflected at the discontinuity. The reflection coefficient, $\Gamma$ , is the ratio of the amplitude of the reflected wave to the amplitude of the incident wave. The reflection coefficient is given by: $$\Gamma = \frac{Z_L - Z_S}{Z_L + Z_S} \tag{2.14}$$ The same process takes place whenever a wave encounters a discontinuity. The formula shows that no reflection ( $\Gamma = 0$ ) occurs if the line and the load are impedance matched: $Z_S = Z_L$ . A simple case is shown in fig. 2.10. The source and the receiver are separated by Figure 2.10: Transmission line without termination. a transmission line with the characteristic impedance of $Z_0$ . The receiver consists of emitter followers with high impedance $(|Z_L| \gg |Z_0|)$ and the result is that much of the signal is reflected ( $|\Gamma| \approx 1$ ). The reflected wave will eventually arrive back at the source, where $Z_S \approx R_L$ . Another reflection will occur at the source if the source and line are not perfectly matched ( $Z_S \neq Z_0$ ). The signal will quickly stabilize for short lines, but the effect on longer lines will be ringing and signal distortion. The reflections can be dealt with by moving the load to the receiver, as shown in fig. 2.11. It is possible to achieve a very good impedance match between the Figure 2.11: Transmission line with single termination. transmission line impedance and the load resistors, but the load will still present a mismatch because of the input capacitance of the emitter followers. The resulting reflections will eventually arrive back at the source, which is not impedance matched at all. Reflections on the transmission line can be efficiently reduced by employing impedance matching at both the source and receiver. This is shown in fig. 2.12. The Figure 2.12: Transmission line with double termination. load and source resistances have now been doubled, as well as the line impedance. This could be difficult to implement. Resistors are manufactured from specialized layers. Resistance depends on the geometry used and can cover a wide range of values, typically three orders of magnitude<sup>6</sup>. The same dynamic range is not possible $<sup>^6 \, {\</sup>rm The}$ practical range is about 2 to 2000 $\Omega$ for VIP-2. for transmission lines, as will be shown in section 2.2.4. Having double matched termination is therefore only possible when load resistance, and thus the matching line impedance, is sufficiently low for the transmission line impedance to be realized. The voltage at the emitter followers can be amplified by utilizing the transmission line as an inductive element, see fig. 2.13. The inductive line and the (parasitic) Figure 2.13: Transmission line with inductive line. load capacitance of the emitter follower will resonate and create gain peaking. The effective inductance, $L_{eff}$ , is given by: $$L_{eff} = \frac{Z_0}{2\pi f} \tan\left(\frac{2\pi l}{\lambda}\right) \tag{2.15}$$ The gritty details have been relegated to section A.2. It should be noted that $L_{eff}$ is proportional to $Z_0$ . The difficulties in achieving a high $Z_0$ effectively limit $L_{eff}$ . The highest $Z_0$ that can be obtained using VIP-2 is about 52 $\Omega$ . The length of the transmission line has to be restricted to less than $\lambda/4$ , because the line appears to be capacitive beyond this point. The line should be made well short of this boundary to ensure that the signal spectrum is encompassed. $L_{eff}$ also becomes less predictable around this point. The effective inductance for a realistic VIP-2 transmission line is calculated for various lengths and the results are presented in fig. 2.14. The effective inductance is less than 100 pH for realistic lengths. The effect of the inductive line will depend on the signal, and thus on the application. An example will demonstrate the resonance between the inductive line and the capacitive load. A frequency domain analysis is selected because it simplifies the conditions. The setup is identical to fig. 2.13, but the differential stage source is replaced with an ideal source. The rest of the components use their respective models. $R_L$ and $Z_0$ are both set to 52 $\Omega$ . l is 150 $\mu$ m; equivalent to about 50 pH. The emitter followers use minimum size<sup>7</sup> transistors. The input capacitance is about 7.3 pF, but somewhat frequency dependent; see section B.1 for details. The result is shown in fig. 2.15 for various frequencies. The extra amplitude is useful, but the price is ringing/overshot and group delay. $<sup>^7</sup>$ With respect to VIP-2. Minimum emitter width and length are 0.5 $\mu$ m×1.0 $\mu$ m, corresponding to an effective area of 0.85 $\mu$ m $^2$ . Figure 2.14: Effective inductance of a transmission line. f=100 GHz, $Z_0=52$ $\Omega$ & $\lambda=375$ $\mu m$ . A time domain analysis will complete the picture. The setup is similar to the previous example with a sinusoid source. Two different lengths of transmission line are used, one 20 $\mu$ m and the other 200 $\mu$ m. The voltage at the emitter followers are shown in fig. 2.16. The source signal is shown as reference. The The rise and fall time of the signal can be improved by adding inductive elements to the receiver load, as shown in fig. 2.17. The inductors are implemented as microstrips shorted to ground. The method is known as inductive peaking [94, 95] and increases the AC load resistance: $$\frac{V_{out}}{I_{in}} = j\omega L + R_L \tag{2.16}$$ Again, the inductor will resonate with load capacitance and the result will be ringing and signal distortion (jitter). The effects of inductive peaking are demonstrated in fig. 2.18. ### 2.2.4 Conductor modelling and realisation The CDR circuit is fairly large, both in physical size and number of components. This makes it necessary to use long interconnections between sub-circuits. Long signal lines<sup>8</sup> are modeled and realized as frequency dependent transmission lines to more accurately predict their behavior. Transmission lines use the top metal layer (metal 4) over an unbroken ground $(V_{GND})$ layer (metal 2), and the lines are adequately spaced to minimize coupling $<sup>^8</sup> Length > \lambda/10$ , or about 140 $\mu m$ for a 100 GHz signal in the VIP-2 process. Top and intermediary $\varepsilon_T$ is 6.0 and 4.2 respectively. Figure 2.15: Simulation of inductive line with and without capacitive load. (6-8 $\mu$ m). 50 $\Omega$ is used for most terminations, but 36 $\Omega$ at the latches. The highest achievable impedance is 51.6 $\Omega$ (3 $\mu$ m metal 4 over metal 2), and a slightly wider line (3.5 $\mu$ m) yields almost exactly 50 $\Omega$ . 33.9 $\Omega$ can be achieved by 4.5 $\mu$ m metal 3 over metal2. The practical range is between 30-51.6 $\Omega$ . 2-D models for transmission lines exists in both ADS and Cadence, but the complexity of the models make them very time-consuming for large circuits. The inherent Cadence model is particularly slow, and tends to cause convergence difficulties more often than not, when the initial state is computed. A simpler, frequency specific model, is shown in fig. 2.19. It consists of a cascade of ten identical stages, creating a simple, distributed model. Such a model was used extensively, but not exclusively, for system simulations. Clock distribution in particular is confined to a narrow bandwidth, suitable for a distributed model, whereas a PRBS sequence occupies a much wider frequency band. ## 2.3 $f_t$ and $f_{max}$ $f_t$ and $f_{max}$ are important figures of merit for transistors, though neither directly predicts digital circuit speed. $f_t$ is the cutoff frequency and is important for analysing the bandwidth of small-signal amplifiers and the power gain of power amplifiers. The cutoff frequency is given by the well-known approximation [96, 97] as follows: $$f_t = \frac{1}{2\pi\tau_{EC}} \approx \frac{1}{2\pi \left[r_E \left(C_{JE} + C_{JC}\right) + \tau_B + \tau_C + R_C C_{JC}\right]}$$ (2.17) where: $$r_E = kT/qI_C (2.18)$$ Figure 2.16: Time domain analysis of inductive line. Figure 2.17: Transmission line with inductive peaking at the load. The maximum frequency of oscillation is approximated by: $$f_{max} \approx \sqrt{\frac{f_t}{8\pi R_B C_{JC}}} \tag{2.19}$$ It is clear that $f_t$ and $f_{max}$ of VIP-2 are very dependent on the collector current density (and temperature for $\tau_B$ & $\tau_C$ ). $f_t$ and $f_{max}$ reach their global maxima at around 3.8 mA/ $\mu$ m<sup>2</sup>, as seen in fig. 2.20. Critical transistors (where switching speed is paramount) are generally biased to operate around this point, but it does result in a significant current (and power) consumption. The heat generated by large circuits such as a CDR circuit can easily destroy the circuit if preventive and Figure 2.18: The effect of inductive peaking on transients. Figure 2.19: Simple, distributed transmission line model. The model is shown single-ended for the sake of simplicity. remedying measures are not taken. Less critical parts of a circuit could be more economical with respect to current density without any significant impact. In some cases, such as current sources, $f_t$ and $f_{max}$ should actually be minimised. Stability is the hallmark of good sources and slower transistors become less susceptible to noise, as will be shown in the following section. Emitter followers have a fairly constant collector current, but the same is not true for the differential stage. Current will almost exclusively pass through only one of the two collectors when the stage is not switching, as shown (indirectly) in eq. 2.5. The solution is to set the current source to drive twice the optimum current for either transistor. The idea is that both transistors will have optimum current density, and thus optimum switching speed, at the critical switching moment. Figure 2.20: $f_t$ and $f_{max}$ for VIP-2 as a function of collector current density. #### 2.4 Current sources Ideal current sources are used in CML based circuits, as well as in other designs presented in this thesis. The physical realization of ideal sources offers a compromise between complexity and performance. #### 2.4.1 Resistive current source The simplest current source takes the form of a resistor, as shown in fig. 2.21. This is a very simple solution with a compact layout. The resistor adds very little parasitic capacitance to the common emitter node, which is helpful in maintaining the impedance at high frequencies. The simplicity has its price. The current through the resistor is not constant, but proportional to the voltage over the resistor, $I_S = V_S/R_{CS}$ . As a result, the current is sensitive to variations in the common emitter node voltage, supply voltage $(V_{SS})$ , process and ambient temperature. Obviously, $V_{SS}$ can be adjusted to optimize the output voltage swing to a particular process outcome, but this will impact the chip globally. Furthermore, the voltage dependency means that the switching noise in the differential stage is not decoupled from $V_{SS}$ . Each source may only be a small contributor, but having numerous sources on a chip exhibiting synchronous switching Figure 2.21: Resistive current source. will make its presence felt. The noise will have a direct impact on analog circuits, but could be acceptable for CML (though it would cause jitter in the eye-diagram). #### 2.4.2 Current mirror A more complex current source is able to mitigate the short-comings of the resistive current source. A current mirror is based on a transistor with a controlled reference current. The reference current is being mirrored in one or more load transistors. A current mirror is shown in fig. 2.22. The current mirror contains resistors, $R_s$ , to make the (current) source less dependent on the collector voltage and to dampen any resonance with $V_{ss}$ . The resistors are usually set to have a voltage drop of 200 mV at nominal supply voltage, $V_{ss} = -4.2$ V, and have minimum width with respect to current density (to save space). The reference resistor, $R_{ref}$ , fixes the mirror current at the desired DC level and takes some heating off the transistor. The current mirror is less sensitive to variations in $V_{ss}$ relative to the resistive current source. The change in supply voltage, $\Delta V_{ss}$ , is felt directly over $R_{cs}$ for the resistive current source: $$\Delta I_s \approx \Delta V_{ss} / R_{cs} \tag{2.20}$$ The voltage over the reference transistor in the current mirror can roughly assumed to be constant, and $\Delta V_{ss}$ would thus be distributed over $R_{ref} \& R_s$ : Figure 2.22: Current mirror with internal bias. $$\Delta I_s \approx \Delta V_{ss} / (R_{ref} + R_s) \tag{2.21}$$ The implication is that the current source will show less variation, given $R_{ref} + R_s > R_{cs}$ . This is a reasonable assumption, all else being equal. An external control of the bias voltage may be desirable in some cases. This is shown in fig. 2.23. An external bias signal, $V_{bias}$ , is connected to the current mirror through a resistor, $R_{bias}$ , of the same size as $R_{ref}^{\,\,9}$ . Setting $V_{bias}$ to the same level as $V_{GND}$ would thus result in roughly doubling the current. The bias voltage originates off-chip, and some stabilization is required to prevent undesirable interference from noise sources such as switching noise. This is achieved by inserting a relatively large capacitor (C=2.89 pF). The capacitor is connected through a resistor ( $R=8.5~\Omega$ ) with the intention of dampening possible oscillations. The dampening resistor is much smaller than the bias resistor to ensure that noise will be absorbed before reaching the reference voltage node. As previously mentioned, key transistors in the design have a current density of $3.5\text{-}4.5~\text{mA}/\mu\text{m}^2$ to maximize $f_t$ and $f_{max}$ . An optimized transistor is obviously not desirable for a current source, which should react as slowly as possible (or ideally, not at all) to any variation (primarily) in collector voltage. The current density must therefore be set to a much different value. A higher current density would <sup>&</sup>lt;sup>9</sup>This is simply a reasonable size; no quantitative argument is implied. 2.5. Layout 29 Figure 2.23: Current mirror with external bias. be both wasteful and outside the safe envelope of operating conditions, and the remaining option is to decrease current density by increasing the size (or number) of the transistor(s). In general, the current mirror transistors have been limited to a current density around or slightly less than 1 mA (commonly 0.95 mA). Lower current densities are easily achieved but would take too much space (too large or too numerous transistors) to be practical. The current mirror load, m, has limitations. The bias current feeds the base currents of both the reference and controlled transistors. This makes the mirror dependent on m, and $R_{ref}$ has to be adjusted accordingly. Parasitic capacitance, $C_{cb}$ in particular, is another issue. It couples the switching noise to the biasing voltage. Experience indicate a reasonable limit for m around four. The simplified current mirrors have been shown as driving m load transistors belonging to a single differential stage, but the load transistors usually belong to parallel differential stages and are not necessarily of the same size as the reference transistor. An external bias signal requires a DC pad. Having several separate circuits on the same chip with independent biasing would require an equal number of additional pads. Obviously, using a common bias signal (and a single pad) for all current mirrors is the most realistic choice, particularly for small circuits. ## 2.5 Layout A schematic is a two-dimensional functional description of a circuit. Signals are either drawn or represented by labels. Neither the functional blocks nor signals in a schematic is limited by physical constraints. Some of these limitations first show up during the layout step of the design process. A medium-size circuit, a static divider, is used to demonstrate some of the issues at stake. The core of the static divider is shown in fig. 2.24, with some of the layout features visible. Figure 2.24: Layout of a CML static divider. #### 2.5.1Power supply & distribution Circuits are powered by voltage sources. The external sources should be stable and immune to noise. The conditions on and off-chip will be different. Wires, leads, pins, packages, bonding wires and pads will inevitably add distributed resistance, capacitance and inductance to an otherwise ideal, external power supply. The internal power distribution also suffers from limitations. Inductance may be negligible and decoupling capacitance benevolent, but the resistance has a significant impact. The digital CML supposedly exhibits a constant current consumption. However, the current sources are not perfect. As a result, switching noise is inserted in the power supply every time a digital gate or buffer changes state. The digital circuits are 2.5. Layout 31 designed with an adequate noise-margin, but mixed-mode<sup>10</sup> circuits are vulnerable. The digital noise will affect the sensitive analog circuits. An obvious solution is to have a completely separate analog and digital power supply [98]. The separation should preferably encompass separate external power supplies as well. This requires additional power supply pads or sharing the existing pads between the supplies. The concept of separate supplies also offers the possibility for local voltage level optimisation for performance or power consumption. An intermediary solution is achieved by isolating only one of the supply levels, preferably the layer shielding the signal lines. This method has been used in the CDR circuit presented in chapter 6. The VCO in the CDR circuit has a separate $V_{SS}$ to provide isolation, but also to save power<sup>11</sup> and optimize the collector current density in the VCO. A completely separate supply, including $V_{GND}$ , would have required more pads than was available; most, if not all, of the circuits designed during the OptCom project have been pad limited. Another precaution is to lay out the digital and analog circuitry in different sections of the chip. The two (or more) sections should be separated by guard rings. This is particularly important for CMOS where a grid style power supply net will be insufficient to protect against substrate noise. The guard rings protects against noise by being low ohmic. Multiple power supply pads (and bonding wires etc.) are advantageous with respect to resistance and inductance ( $L_{total} = L \parallel ... \parallel L = L/n$ ). Any available pad not used for signalling should be assigned to power supply. AC signals use a GSG configuration for shielding, inherently benefiting $V_{GND}$ access. The problem is usually restricted to finding enough suitable pads for $V_{SS}$ . Multiple pads also reduce current density, which prevents electro-migration and decreases internal voltage drop. Noise is being capacitively coupled into transmission lines, when the lines are adjacent to substrate, other transmission lines and circuits. The noise can be effectively reduced by employing grounded shields between the transmission line and noise sources. A simple shield is illustrated in fig 2.25. A grounded metal layer (M1) is used to provide a shield between signal and substrate. The shield is extended by two connected, parallel sidebars (M2) to reduce noise coupling through fringe capacitance. Nearly complete shielding is achieved by enclosing the transmission line, as shown in fig. 2.26. Ideally, no capacitive coupling should exist between the transmission line and the exterior. It requires a wide space and an additional metal layer, which may not be attractive with regards to routing. The shield should preferably employ a separate net $(V_{GND,Shield})$ that is solely used for shields. If this type of connection is not possible due to layout and space constraints, then the noisy digital ground $(V_{GND,Digital})$ can be used for shields, although this does not provide ideal shielding. The digital ground is used even though the digital circuits are themselves both more noise generating and noise immune relative to analog circuits. Most of the effect on differential signalling will be common mode and have little impact. Using the sensitive analog ground $(V_{GND,Analog})$ for shielding both analog and digital routing would couple digital noise directly to analog <sup>&</sup>lt;sup>10</sup>Containing both analog and digital circuitry. $<sup>^{11}{\</sup>rm The~VCO}$ requires a $V_{SS}$ of -5.0 V while -4.2 V is sufficient for the remaining circuits. The VCO is fairly small in terms of power consumption relative to the remainder. Using a global $V_{SS}$ of -5.0 V would increase power consumption about 19 % in the remainder. Figure 2.25: Transmission line using partial shield. Figure 2.26: Transmission line fully enclosed by shield. circuits. Passing a signal as a shielded differential pair will diminish the effects due to common mode noise and variation between local grounds, as noted in the section on differential signalling (2.2.2). The circuits presented in this thesis use differential pairs for all AC signals. Only partial shielding is employed, as demonstrated in fig. 2.27. The transmission lines (M4) are shielded by a ground layer (M2). The Signal Signal Figure 2.27: Shielding of transmission lines. space between the differential pairs is sufficient to minimize fringe coupling. The supply voltages, $V_{GND} \& V_{SS}$ , are contained in coherent (but broken), separate metal layers, as seen in fig. 2.24. The two metal layers are adjacent: $V_{SS}$ uses M1 and $V_{GND}$ uses M2. This has a number of advantages: Supply routing at circuits is simplified as the supply voltages become almost 2.5. Layout 33 omnipresent. • Electro-migration is simpler to deal with. Large circuits will draw a significant current. Even large, unbroken swaths of metal may not necessarily be enough to avoid electro-migration, but it sure does help. - Low voltage drop. A significant current will force a voltage drop in the metal layers, however low the resistance. The result is that the supply voltage available on-chip is less than the external supply biasing, though there will also be losses between external supply and the pads. Substantial areas of metal reduce resistance and thus voltage drop and noise. - Decoupling capacitance. There is some capacitance between the metal layers, though this is generally much smaller than dedicated capacitors where layer spacing (d) and dielectric material (ε) is optimised: C = εA/d if fringe capacitance is ignored. Most capacitors are in the base layers, underneath the metal. This allows designers to put distributed decoupling capacitance underneath any area that is used only for metal routing. VIP-2 uses a different approach, where dedicated capacitors are sandwiched between two adjacent metal layers (M1 & M2). This is somewhat awkward with respect to routing, given the limited number of metal layers (4) available, but acceptable for SSI/MSI. The presence of unbroken M1 & M2 facilitates the utilisation of unused space on a chip as decoupling capacitors. The placement of decoupling capacitors has no impact on routing, as the differential pairs are placed above M2 (in M4). - Shielding. The wide supply metal surrounding the circuits acts as low-ohmic guard rings minimising voltage fluctuations and absorbing radiation. The metal layers are opened up at active devices to minimize parasitic, capacitive coupling. This is a compromise between providing a high quality power supply and limiting parasitics. The compromise is partially based on qualified guesswork, as the coupling between metal and transistor cells remains undefined in VIP-2. # Chapter 3 ## Static divider A divider is a circuit performing frequency division, typically 1:2, of an incoming clock signal. A divider can be either static or dynamic, depending on the topology<sup>1</sup>. The two types of topology have different characteristics. The static divider is based on latches, and can operate over a very wide frequency band, as shown in fig. 3.1. The band is limited by the operating speed of the latches Figure 3.1: Properties of static dividers vs. dynamic dividers. (at high frequencies) and the transition (rise and fall) time of the incoming clock signal (at low frequencies). Failure occurs rather abruptly at the maximum operating speed. The transition time is important because the latches become unstable when switching occurs too slowly. The static divider would operate down to DC, or even asynchronously, if given a square wave clock signal. The dynamic divider resonates at a particular frequency, and the resonance is induced by the incoming clock signal. The resonance (obviously) occurs over a much smaller bandwidth relative to a static divider, to be both effective (high gain) and immune to noise (by placing the noise outside the band). However, a resonating filter can be made to operate at a greater speed than a static divider. A dynamic divider <sup>&</sup>lt;sup>1</sup>It is possible to give a static divider dynamic properties, limiting its range to narrower bandwidth. must certainly be considered if the target frequency lies beyond the bandwidth of latches. Dividers can be used in several transceiver circuits. MUX & DEMUX circuits with multiple stages requires several, mutually synchronized clocks that are both fractions of a common frequency. Some phase detector architectures are based on a quadrature phase clock, that can be generated by dividers (both static and dynamic) from an ordinary clock source. ## 3.1 T flip-flop A static divider is created by connecting two D latches in the form of a T flip-flop, as shown in fig. 3.2 (with I/O buffers). The incoming clock signal is clocking both Figure 3.2: Schematic of a T flip-flop. latches at alternate phase shifts, and the flip-flop input signal and the inverted flip-flop output signal are connected. The signals are differential so the output signal is inverted simply by switching the two complementary lines. The value stored by the T flip-flop will change once every clock cycle. Each new value stored is the inverse of the previous value, thus resulting in a new clock signal at exactly half the controlling clock frequency. The principle is identical to that presented in [78], a design using the same process (VIP-2), and in [77], using the preceding process (VIP-1). The divided clock signal is accessible in two places, i.e. at the interfaces between the D latches. The signals at the two extraction points have the same frequency, but their phases are spaced $90^{0}$ apart. A quadrature clock signal is achieved by using both of the differential signals simultaneously. The static divider is driven by an external clock signal which is buffered. The two divided output signals are also buffered. The input buffer provides useful regeneration of the signal in case of a substandard input signal. The output buffers are critical, because the core is highly sensitive to load. Any additional load placed on the core will decrease its maximum operating frequency. The buffers use emitter followers to provide impedance transformation and minimize the load. A lack of buffers would also make the maximum operating speed dependent on the external load. The maximum operating speed for static divider circuits are often used as benchmarks for process performance. In these cases, it is customary to use a single, buffered output to reduce 3.2. D latch 37 the load as much as possible. This results in an asymmetric circuit that is faster, but less versatile. #### 3.2 D latch Two D latches are used to realize the T flip-flop of the static divider. The latches use differential CML logic, as shown by the schematic in fig. 3.3. The interfaces are also Figure 3.3: Schematic of high-speed a D latch. current mode, generating the voltage input signal at the ingress. A pair of inductors were added to the load to introduce some inductive peaking, increasing the signal slope and improving the eye-diagrams ever so slightly. The inductors are in the form of transmission lines (microstrips in M4 over M2) with an effective inductance of about 20 pH. The clock signals have an extra stage of emitter followers for the purpose of level shifting to the appropriate voltage level. The extra emitter followers also cause some extra propagation delay before the clock signal reaches the switching transistors. The additional delay can have a negative impact on data capture if the data and clock signals and not optimally synchronized. In this particular case, the data is inherently synchronized, and the extra propagation delay itself is not an issue. All transistors are biased at an average current density of 3.8 mA/mm<sup>2</sup> to optimize switching speed. The currents in the emitter followers are mostly DC, but the switching transistors (data and clock) are pure AC (either 0 or 7.6 mA/mm<sup>2</sup>). This ensures that the switching transistors are at their optimum biasing condition at the switching point, midway between the two current levels. The layout of T flip-flop is of some interest, and it is therefore shown in fig. 3.4. A Figure 3.4: Layout of a CML static divider core. typical design flow would include making the layout of each D latch, and then connect the circuits appropriately. The schematic of the T flip-flop, fig. 3.2, shows how the inverted output data is fed back as input data. In this layout, however, both latches face each other and share the same I/O ports, seen as horizontal red lines (M2) in the center, between the D latches. The two groups of transistors, top and bottom, each represent a D latch. Transmission lines (long M4 microstrip curls shown at either side of the layout) are employed at the output to achieve some inductive peaking, increasing the bandwidth about 6% according to simulations. The inductors do give the divider a slightly dynamic touch. The layout shows only the core itself (the T flip-flop), input and output buffers/amplifiers being omitted. The signals are shared between the latches and the buffers. External load has a major impact on the rise 3.3. Buffers 39 and fall times as it loads the core directly. Buffers are placed between the external load and the core to minimize the load and make it it less dependent of the external load.. ### 3.3 Buffers Simple amplifiers are used to buffer the input and output signals. The output buffers consists of differential stages with a couple of emitter followers, as shown in the schematic in fig. 3.5. The buffers are non-linear, as described in section 2.1.1. They Figure 3.5: Schematic of a buffer based on a differential stage. maintain a fairly fixed signal swing down to very low input voltages, because the output voltage switches rather steeply. The full voltage swing is $\sim R \times I_D$ . There is seemingly no resistive load on the input of the output buffer, but this is because the resistive load is placed on the D latch, in parallel with the buffer load. Thus the inductive peaking also benefits the output buffer. The layout is shown in fig 3.6. The input buffer is similar to the output buffer, but contains a resistive load on the input, matching the transmission line. The component sizes are also optimized to drive the static divider rather than provide a minimum load on the core. ### 3.4 Simulation Simulations indicate that the D latches are capable of capturing data in excess of 150 Gb/s, but experience indicates that the extracted parasitics are likely to be a bit on the pessimistic side. A MUX, using nearly identical D latches for D flip-flops, was manufactured during the OptCom project [74]. Measurements showed "acceptable" eyes at 165 Gb/s; testimony to the capabilities of VIP-2. The static divider can not be expected to match this result, simply because the core load is greater than in a chain of D latches; each latch has to drive a subsequent latch as well as an output Figure 3.6: Layout of a buffer based on a differential stage. buffer. The static divider would likely be faster than a D flip-flop if the external load was removed, but this would void any useful application. The static divider is simulated by extracting parasitics and adding transmission line models for the inductors and signal lines (albeit short) between buffers and core. The T flip-flop is sufficiently small to be extracted in its entirety. This is advantageous, because the D latches are so tightly bound. The transmission lines cannot be accurately simulated using lumped, frequency specific, models. However, it does help that the input and output signals each consists of a prominent spectral component, for which the impedances can be assumed to be constant. The impedances can then be matched to that particular frequency, most appropriately placed in the vicinity of the maximum operating frequency, to preserve the wideband operation of the static divider. The results are shown in figs. 3.7 & 3.8. It takes a little while for the Figure 3.8: 50 GHz quadrature phase clock output (simulated). 3.5. Measurement 41 clock signal to induce an oscillation, cross the buffers and switch the core. This is due to the initial condition with open latches, when both latches are in a metastable state, and will have no impact on an application. The static divider operates up to at least 114 GHz, providing a safe margin for the 100 GHz target. The static divider is sensitive to core loading, and one of the output buffers could be removed if only a single phase was required<sup>2</sup>. #### 3.5 Measurement A photo of a particular version of the static divider is shown in fig. 3.9. The four Figure 3.9: Photo of a static divider chip. peaking inductors are rather prominent. The tangerine areas are MIM-capacitors used for decoupling, and occupy most of the unused space. The capacitors are placed below the ground plane and should not affect the signals. A large number of pads are used, all of them connected; the design is clearly pad limited. Not all of the pads can be utilized at the same time though, as there is a practical limit to the proximity of probes. The signals are limited to three; two AC and one DC. The rest of the pads are for $V_{GND} \& V_{SS}$ . The design is differential, but the input configuration is single-ended. This is done to simplify the test setup. It is rather difficult to transmit a high-speed differential clock signal while maintaining the relative phase at the ingress. One of the two differential input signals is therefore connected to a DC bias that may be adjusted externally to a suitable voltage level. The bias signal is stabilized by an on-chip capacitor, seen in the lower left corner. The remaining, single ended input signal, must have twice the voltage amplitude to keep the sensitivity constant. A similar <sup>&</sup>lt;sup>2</sup>A common practise when speed is promoted at the cost of usability (relevance), e.g. at benchmarking. choice has been made for the output signal, where only one of the four buffered signals are routed to a pad. Again, it is difficult to trust the relative phase shifts between the four output signals once they reach the detector. Given the architecture, it can safely be assumed that the other three output signals will have to operate correctly if the first one does. The fastest clock signal that can be generated electrically in the MEL lab at MC2 is 43.5 GHz. This is only a fraction of the potential bandwidth, but sufficient to verify that the operation is correct and according to specifications. The results are shown in figs. 3.10 & 3.11. Figure 3.10: Single-ended 43.5 GHz clock input (measured). Figure 3.11: Single-ended 21.75 GHz clock output (measured). The operation of the static divider is thus verified, but not the maximum operating speed. The purpose of the static divider is to perform the conversion of a 100 GHz differential clock signal to a 50 GHz quadrature phase clock signal. The divider was therefore combined with two different 100 GHz VCO circuits, to be presented in a chapter 4, to create the required 50 GHz quadrature clock versions of the VCOs. The relatively low frequency of the resulting quadrature signal also provided an opportunity for easy measurements and verification<sup>3</sup>. The combined VCO+SDIV circuits verify that the static divider operates at 100 GHz+. The stand-alone static divider was made as a back-up for testing in case of a design flaw in either of the combined circuits. Such an error could have made it impossible to debug the circuits separately, whereas the stand-alone static divider can be tested and verified independently. The circuit is not intended for benchmarking, as it has a quadrature output that loads the core excessively. Finally, some specifications for the stand-alone 100 GHz+ static divider are given in table 3.1. $<sup>^3</sup>$ Measurements up to 50 GHz (and slightly beyond) can be performed directly using the Agilent 8565EC, without any mixers etc. 3.5. Measurement 43 | Circuit | Static divider | |-----------------------------|----------------| | $\operatorname{Width}$ | $860~\mu m$ | | Height | $560~\mu m$ | | Area | $0.48 \ mm^2$ | | Supply voltage | -4.2 V | | Current consumption (core) | 142 mA | | Power consumption (core) | $0.60 \ W$ | | Current consumption (total) | 224~mA | | Power consumption (total) | $0.94 \ W$ | | Pad pitch | $150~\mu m$ | | Transistor count | 26 | Table 3.1: Specifications for the static divider circuit. # Chapter 4 ## VCO A VCO circuit is used to generate the clock signal in a CDR circuit. The CDR circuit may require a full-rate, half-rate or quadrature half-rate clock depending on the architecture of the phase detector. The quadrature half-rate clock is typically generated using a combination of a full rate clock and a divider, though there are other possibilities as well. There are several different types of VCO circuits, two of which have been implemented in this project. Several different versions have been made of these two VCO types. The different versions of the VCO circuits are differentiated by their center frequency, type of inductor, output buffer design and additional circuitry. Some of the VCO versions have been combined with an integrated static divider and some additional buffering. The HBT technology is generally regarded to be the most suitable technology for realizing low phase noise VCOs due to the inherently low 1/f noise [82, 84, 85, 86, 78, 90, 80, 99, 81], although oscillator topology and design is also crucial for the results. ## 4.1 Phase noise & jitter Clock circuits are found in many designs. VCO circuits are commonly used in frontends for synchronization (optical/electrical) or band selection (wireless). The increasing frequencies for communication systems and relatively narrowing band gaps forces the designer to pay close attention to the qualities of the clock signal. The quality of a clock signal is typically described by its output power and its phase noise or jitter. The phase noise is a description (in the frequency domain) of the relative magnitude of the power density at a particular offset frequency from the clock center frequency to the power density at the clock center frequency, usually measured over a nominal bandwidth of 1 Hz. Jitter can be defined in several different ways, e.g. period jitter, cycle-to-cycle jitter and accumulated jitter. Among these definitions, period jitter is most often encountered. Period jitter is demonstrated in the clock waveform in fig. 4.1. It can be described as the difference between the measured period, $T_0$ , and the ideal period, $T_P$ , of a clock cycle: Figure 4.1: Period jitter in clock signal. $$T_j = T_0 - T_p \tag{4.1}$$ The random quantity $T_j$ must have a zero mean, since the average clock frequency over a sufficiently long time span is presumed to remain constant. The RMS of $T_j$ can be defined as: $$RMS\left(T_{j}\right) = \sqrt{\left(T_{j}^{2}\right)}\tag{4.2}$$ The next step is to define the phase noise spectrum, $\mathcal{L}(\Delta f)$ . The power spectrum density of the clock signal is defined by $S_C(f)$ . The phase noise spectrum is then defined as the relative difference between the power density at the frequency of interest $S_C(f_c + \Delta f)$ and the power density at the clock center frequency, $S_C(f_c)$ . The definition of $\mathcal{L}(\Delta f)$ is illustrated in fig. 4.2. The relative difference is measured Figure 4.2: Definition of phase noise spectrum. in dB relative to the carrier and denoted dBc. The phase noise spectrum (in dBc) can be expressed as: $$\mathcal{L}(\Delta f) = 10 \cdot log \left[ \frac{s_c \left( f_c + \Delta f \right)}{s_c \left( f_c \right)} \right]$$ (4.3) The phase noise represents the ratio between the spectral amplitude at the frequency of interest and the peak spectral amplitude at the center frequency, in dBc. The phase noise is therefore a negative value, except at the carrier itself, where it is $0\ dBc$ by definition. Using the Fourier series expansion, the sinusoid signal of a clock signal with phase noise can be written as: $$C(t) = A \cdot \sin(2\pi f_c t + \theta(t)) = A \cdot \sin\left(2\pi f_c \left(t + \frac{\theta(t)}{2\pi f_c}\right)\right)$$ (4.4) The period jitter becomes: $$T_j = \frac{\theta(t)}{2\pi f_c} \tag{4.5}$$ The next step is to show the relationship between the period jitter, $T_j$ , and the phase noise spectrum, $\mathcal{L}(f)$ , which is: $$RMS\left(T_{j}\right) = \frac{1}{2\pi f_{c}}\sqrt{\left(\theta^{2}\left(t\right)\right)} = \frac{1}{2\pi f_{c}}\sqrt{\left(2\int_{0}^{\infty}\left(10^{\frac{\mathcal{L}\left(f\right)}{10}}\right)df\right)}$$ $$(4.6)$$ In some applications, such as SONET and Ethernet, the jitter is only monitored within a specific band. In such cases, the RMS jitter within the band can be calculated as: $$RMS(T_{j}) = \frac{1}{2\pi f_{c}} \sqrt{\left(2 \int_{f_{1}}^{f_{2}} \left(10^{\frac{\mathcal{L}(f)}{10}}\right) df\right)}$$ (4.7) These expressions will prove useful later on in the chapter. ## 4.2 Design flow Phase noise is one of the most important parameters for a VCO circuit. The criteria for achieving low phase noise is given by Leeson's equation [100] for phase noise: $$\mathcal{L}\left(\Delta\omega\right) = 10 \cdot log \left\{ \frac{2 \cdot F \cdot k \cdot T}{P_{sig}} \left[ 1 + \left(\frac{\omega_0}{2 \cdot Q \cdot \Delta\omega}\right)^2 \right] \cdot \left[ 1 + \frac{\Delta\omega_{1/f^3}}{|\Delta\omega|} \right] \right\}$$ (4.8) F represents the noise figure of the transistor, k is Boltzmann's constant (8.617 × $10^{-5}$ eV/Kelvin), T is the absolute temperature, $P_{sig}$ is the signal power in the tank, Q is the loaded Q of the tank, $\omega_0$ is the oscillation frequency, $\Delta\omega$ is the offset from the oscillation frequency, $\Delta\omega_{1/f^3}$ is the corner frequency of the 1/f noise. The equation cannot be used directly to develop a VCO architecture, but it is phenomenological and as such useful in order to get an understanding of how phase noise can be minimized. The intrinsic noise figure of the transistor should be as low as possible, to minimize the phase noise. The InP HBT technology, used in this thesis<sup>1</sup>, has an advantage over the HEMT technology because of the intrinsic low 1/f noise<sup>2</sup>. Power consumption should be limited to keep heating, and thus temperature, as low as possible. The signal amplitude in the tank should be maximized to increase the power in the tank, and the Q of the tank should be as high as possible. All of these factors have to be taken into account when attempting to achieve a low phase noise. The oscillation frequency of a VCO circuit is usually adjusted by using a variable capacitor in the tank, such as a varactor diode with variable bias voltage. A varactor with high Q is important, but no dedicated varactor diode exists in the VIP-2 process. A reverse biased transistor had to be used instead. Different transistor configurations (width, length, number of fingers, B-CE or BE-C coupling etc.) were simulated to find the diode (transistor) with the highest Q. The single-finger B-CE configuration was found to be the best choice in this respect. The superiority of the single-finger configuration was somewhat surprising, as multiple fingers were expected to offer lower base resistance for the same capacity. The design flow follows a procedure developed at MC2, cf. [99]. The example below describes the procedure for a Colpitt type VCO circuit, but the same procedure was used for all of the VCO circuits presented in this thesis. The procedure follows these iterative steps: - 1. Loop gain analysis. A small-signal analysis is performed to ensure that 1) the loop gain is >1, i.e. the gain will be greater than the losses, and that 2) the phase shift, $\rho$ , indicates a resonance ( $\rho \in \mathbb{Z} \times 180^{\circ}$ , e.g. $0^{\circ}$ ) at the frequency of interest, see fig. 4.3. These are simplified theoretical requirements, and the results of the small-signal analysis will not be accurate in the time-domain where the system is no longer linear. Some additional gain beyond the critical level of unity is desirable to take the inevitable losses into account and to ensure start-up. A gain between 2 and 5 is a useful target, as too much gain results in additional noise. The phase and the gain is found by inserting an ideal current source between the amplifier and the tank. The phase shift in the tank is the phase of the voltage response at this point. The gain of the tank is found by looking at the impedance in both directions: $Gain(f) = \frac{Re(Z_{ampf}(f))}{Re(Z_{tank}(f))}$ . threshold frequency, where the phase crosses the $\rho \in \mathbb{Z} \times 180^{\circ}$ boundary (and Gain>1), should be slightly higher than the target frequency, as small signal analysis is somewhat more optimistic than large signal analysis with regards to losses. Additional parasitics, found during extraction of the layout, will also tend to lower the oscillation frequency by increasing the capacity of the tank. - 2. Waveform optimization. A harmonic balance simulation is performed where the collector current and the collector-emitter voltage waveforms are adjusted for minimum conduction angle<sup>3</sup>, see fig. 4.4. The drive level for the tank is fairly <sup>&</sup>lt;sup>1</sup>DHBT to be precise. <sup>&</sup>lt;sup>2</sup>The low 1/f noise is an advantage, but it is not sufficient to prove HBT superior under all circumstances; cf. [101]. <sup>&</sup>lt;sup>3</sup>This will keep overlaying frequencies, and thus phase noise, to a minimum. Figure 4.3: The result of a small-signal analysis of a Colpitt type VCO circuit. low in this example, but the well defined trace serves to illustrate the point. Increased drive levels will cause distortion, but also increased signal power. The tank voltage should have a magnitude comparable to the breakdown voltage of the HBT<sup>4</sup>, to increase the power (and lower the phase noise) in the tank. If the tank voltage is too small, the transistor area must be increased and the design process has to restart from step 1. $V_{CE}$ of the HBTs are monitored, and the HBTs should not be allowed to reach into the saturation region<sup>5</sup>. - 3. Varactor voltage sweep. The phase noise is checked over the varactor voltage range, which is the range of the control voltage for the VCO circuit. Any abnormal increase in the phase noise can normally be traced back to changes in the $I_C$ - $V_{CE}$ waveform performed in step 2. Unfortunately, the varactors have their best Q where the tuning efficiency $(\frac{\Delta C}{\Delta V})$ is lowest (reverse biased), and vice-versa (no bias). - 4. Layout generation. An appropriate layout is made, and the layout is extracted including parasitics and re-simulated. A secondary HB analysis is performed to ensure that the waveforms remain fairly unchanged. - 5. Redesign of the layout. The layout introduces parasitics in the circuit, and the parasitics introduce losses. The layout must be redesigned if the layout step introduces an increase in phase noise. EM-simulations (ADS Momentum) can be used to increase the accuracy of the simulation, by extracting S-parameter port models of the VCO circuit. The waveguides themselves appear to be more <sup>&</sup>lt;sup>4</sup>Breakdown voltage for VIP-2 is 4.5 V. <sup>&</sup>lt;sup>5</sup>The saturation region of operation is characterised by a forward bias potential on both the base-emitter and the base-collector junctions (implying that $V_{BE} \geq V_{CE}$ ). Carriers are injected and removed when the transistor enters and leaves the saturation state, acting as a parasitic capacitance. Figure 4.4: Simulated DC characteristics and trajectory for a 3x0.5x2.6 HBT. reliably modeled by the inherent transmission line models in ADS and Cadence, as long as they are not coiled. Step 1 must be repeated to ensure that both the loop gain and the frequency at the threshold phase are still in the valid range. Oscillation frequency, output power and relative tuning bandwidth is found using large signal analysis. ## 4.3 Testing The VCO circuits with integrated static dividers can be tested in a straightforward manner, because their bandwidth is $\leq 50~\mathrm{GHz^6}$ . Measurements in the W-band (75-110 GHz) are more complex because they must be executed in an indirect manner. An external mixer is used to shift the clock signal to a frequency band within the range of the spectrum analyzer. The local oscillator (LO) of the spectrum analyzer has a range of 4.15-6.09 GHz and uses the 18th harmonic (74.7-109.62 GHz) to encompass almost the entire W-band. The mixing process results in mixing products that generate a number of false peaks. A standardized procedure has been developed by Agilent to identify the correct peak, representing the center frequency of the oscillator. The mixing introduces a significant attenuation (about 45 dB) that must be compensated for when the spectrum is interpreted; see Appendix D. The setup is shown in fig. 4.5. The attenuation in the mixer, the probe and the 1 mm cable that connects them is frequency dependent. The manufacturers supply individual calibration sheets with all components, but the frequency dependency necessitates a manual conversion for each point. The attenuation for the mixer, probe and 1 mm $<sup>^6\</sup>mathrm{The}$ nominal bandwidth limit of the Agilent 8565EC spectrum analyser. The bandwidth is actually slightly higher. 4.3. Testing 51 Figure 4.5: VCO measurements in the W-band using harmonic mixer. cable (13 cm) at 100 GHz is a total of 44.5 dB, or 40.9 dB, 1.191 dB and 2.46 dB respectively. The total attenuation over the W-band can be tabulated to ease the conversion. The table is found in Appendix D. The spectrum analyzer can also be programmed to adjust the measurements directly by entering the attenuation table point by point. This is by far the simplest approach when multiple VCO circuits are being tested. The compensated single-ended power spectrum for the Colpitt VCO circuit #2, to be presented in section 4.6, is shown in fig. 4.6. The output power is Figure 4.6: Colpitt VCO circuit #2 (mixed) power spectrum containing multiple peaks, most of them false. The correct peak, representing the VCO center frequency, is found by using a standardized identification procedure. slightly above -10 dBm, representing a single-ended voltage swing of about 300 mV, which happens to match the expected level. The output power for the differential clock signal is 3 dB higher. The observed frequency band is sufficiently wide to show several mixing products, one of them rivalling the true representation in amplitude. The correct peak has to be identified before output power, oscillation frequency and phase noise can be measured. In this case, the centre peak is the correct one, while the other peaks represents various mixing products. The phase noise can be measured once the correct peak has been identified and locked. The spectrum analyser is fitted with a phase noise module (a program) that can make either a sweep or spot frequency measurement. The phase noise spectrum for the Colpitt VCO circuit #2 is shown in fig. 4.7 below. Figure 4.7: Colpitt VCO circuit #2 (mixed) phase noise spectrum.FALSE The voltage supplies were not required to be filtered, given the measured noise level<sup>7</sup>. ## 4.4 Colpitt VCO circuit #1 (microstrip) The first design to be presented is a Colpitt type. The only published VCO circuit using the VIP-2 process, [78], also happens to be a Colpitt type. Several other III-V Colpitt VCO circuits for W-band applications have been published, e.g. [102, 103, 87, 88, 89, 80, 81], and two of them were probably intended for a CDR circuit [80, 81]. The VIP-2 VCO circuit [78] uses the intrinsic capacitance of a diode transistor, $C_{be}$ , to dictate the resonance of the tank. This is quite feasible at 80 GHz, but the type of capacitor limits the dynamic bandwidth of the VCO circuit because the relative change in capacitance over a realistic control voltage span is not very large. Also, the oscillation frequency of W-band VCO circuits generally require very small passives ( $\omega_{osc} = \sqrt{\frac{1}{LC}}$ ), which makes the circuits vulnerable to the effects of parasitics. The variable (diode/transistor) capacitance becomes comparable to the parasitic capacitance in the tank, particularly the input capacitance of the transistors. <sup>&</sup>lt;sup>7</sup>Biasing and voltage supplies introduce a low level of phase noise. The impact of such noise sources is noticeable when sensitive measurements are carried out. Adequate shielding is necessary to obtain reliable results, when VCO circuits with very low phase are being tested. Voltage sources are usually filtered with 1 Hz low pass filters and (inherently stable) batteries are used for biasing. The whole design process must take this sensitivity into account, and parasitics must be carefully modeled and limited. A similar design [90, 104], also in InP HBT, offers a better dynamic bandwidth and a more predictable behavior. The schematic of a Colpitt type VCO circuit is shown in fig. 4.8. A more detailed schematic can be found in Appendix C, see Figure 4.8: Schematic of the Colpitt VCO circuit #1. fig. C.1. The differential VCO circuit contains a resonator tank, an output buffer, current sources and a tuning/biasing circuit. The resonator consists of an inductor (a M4 microstrip over M2 ground) and a capacitor (B-CE junction capacitance of the tuning transistors, C & C', and parasitic capacitance, $C_{be}$ , from Q2 & Q2'), with an amplifier (Q2 & Q2') in between. The oscillation frequency is roughly given by: $$\omega_{osc} = \frac{1}{\sqrt{L \cdot \frac{C_{be,Q2} \cdot C}{C_{be,Q2} + C}}} \tag{4.9}$$ The frequency of the oscillator is tuned by biasing the capacitance, C & C'. The amplifiers (Q2 and Q2') are fairly large, to provide large currents for high output power, large signal power (low phase noise) and increased tuning range. Eq. 4.9 shows that the optimal tuning range occurs when $C_{be}$ and C have roughly the same size. The amplifier compensates for internal losses in the tank and affects the output power. The VCO requires an output buffer to isolate the tank from the external load, and the buffer should drain as little signal power as possible. The output signal is connected to a simple, differential common base pair (Q1 and Q1'), providing good isolation at the cost of some additional power consumption. The current sources and biasing are similar to the circuits used in the static divider in chapter 3. An attempt was made to combine the two current sources using large inductors (i.e. transmission lines) to isolate the two sides of the tank [80, 81, 102, 103, 87, 88, 89], but sufficient isolation could not be reproduced. This would have enabled a more efficient current source with respect to linearity and area. The three large diodes $(D_1, D_2 \& D_3)$ implement a virtual ground node between the two resonators. The diodes have to be fairly large to offer a stable ground. The two single-ended resonators are synchronized through the capacitors (C & C'), to form a differential VCO circuit. The inductor is the primary phase noise limiting component in a LC resonator VCO circuit. The inductors can be realized using either transmission lines $(l \le \lambda/4)$ or conductor coils. The coils have a higher inductance for the same metal width and length, and can therefore be made shorter. The smaller coil offers less loss, leading to a higher self-resonance frequency, better Q and less noise. Coils are generally used at lower frequencies [99, 105], whereas transmission lines dominates the W-band [78, 81, 10], though there is a notable exception [84]<sup>8</sup>. A study of recent CDR circuit milestones reveal that all the design teams have chosen transmission lines for their VCO circuits [10, 7, 3, 4], even when a familiar alternative existed [84]. The transmission line models inherent in Cadence and ADS also offered reliable predictions of performance, compared to the more cumbersome EM simulations of the coil. The ADS and Cadence models were compared with each other, as well as an extracted model, and the results were found to be very close. A transmission line (about 200 $\mu$ m) was selected as the inductive element to play it safe. The layout of the Colpitt VCO #1 chip is shown in fig. 4.9, and two photos are shown in fig. 4.10 & 4.11. The transmission lines, in the form of two loops, are dominant features. The tangerine, grated areas are decoupling capacitors. The capacitors are used for both stabilising the external tuning voltage and decoupling of the supply voltage. The Colpitt VCO circuit #1 has been designed in two versions; with (fig. 4.11) and without (fig. 4.10) the static divider presented in chapter 3<sup>9</sup>. The simple version offers full-rate clock performance with very low power consumption, while the divider version offers a quadrature half-rate clock signal. The lower output frequency of the divider version makes it much simpler to test. Only one of the (differential) quadrature clock signals is connected to a pad. The remaining clock signals are safely assumed to perform as expected given correct operation of the first <sup>&</sup>lt;sup>8</sup>The design uses a single turn spiral inductor. <sup>&</sup>lt;sup>9</sup>There are also other variations. Some versions have longer transmission lines to ensure lower oscillation frequency. These would be useful if the VCO frequency was too far off the target, beyond the W-band. Fortunately, this situation never occurred. Figure 4.9: Layout of the Colpitt VCO circuit #1. clock signal. Determining the relative phases of the four clock signals in the V-band would be quite a challenge, without providing any additional, useful information. The power and phase noise spectra of the Colpitt VCO circuit #1 circuit are shown in figs. 4.12 & 4.13. The clock signal is seen as a peak in the power spectrum. The phase noise is measured as a function of the frequency offset, and is shown in fig. 4.13. The phase noise decreases with the offset frequency away from the carrier frequency. Phase noise as a figure of merit is related to a particular offset frequency, typically given at 100 kHz or 1 MHz. The phase noise makes it easy to compare different VCO circuits, assuming that they have the same oscillation frequency. The comparison can be extended to the case where the VCO circuits are operating at different frequencies. This is done by normalising the frequency axis, based on the assumption that the shape of the noise spectrum should match relative to the center frequency. The Colpitt VCO circuit #1 can be tuned to an oscillation frequency of 96.17 GHz, an output power of -7.77 dBm and a phase noise of -81.33 dBc at 1 MHz. The output power appears almost constant over the tuning range. More surprisingly, the phase noise also appears to be almost independent of the tuning voltage. The phase noise measurement itself contains some noise, and it is possible that the difference is too small to be detected. The Colpitt VCO circuit #1 circuit is a very satisfying design. It has a relatively simple architecture, the inductors and capacitors are well modeled (as it oscillates close to the target frequency of $100~\mathrm{GHz}$ ), it has a very low power consumption, and the output power can easily be increased by adjusting the buffer. However, the phase Figure 4.10: Photo of the Colpitt VCO circuit #1. Figure 4.11: Photo of the Colpitt VCO circuit #1 including a static divider. Figure 4.12: Colpitt VCO circuit #1 (mixed) power spectrum. noise is more than 3 dB higher than the -85 dBc predicted by the ADS modeling. # 4.5 Negative resistance VCO circuit The CDR circuit requires an operational VCO circuit. The Colpitt VCO circuit #1 was made for this purpose. The very long processing and design cycles of the VIP-2 process emphasizes the importance of working InP on the first run. A second VCO circuit was designed simultaneously, to ensure redundancy in case the Colpitt VCO circuit #1 did not perform as expected. A different architecture was selected for the secondary VCO circuit, to spread the risk even further. The second VCO circuit Figure 4.13: Colpitt VCO circuit #1 (mixed) phase noise spectrum. has an architecture known as a negative resistance type. The schematic is shown in fig. 4.14. A more detailed schematic can be found in appendix C, see fig. C.3. The oscillator tank is formed by L (L') and C (C'). The differential architecture is cross-coupled to give a negative resistance when viewed from the tank. The capacitor is a B-CE coupled transistor, similar to the Colpitt VCO circuit #1. The inductor is in the form of a transmission line (a M4 microstrip over M2 ground), but the effective capacitance makes the required inductance much smaller. Thus, the length of the transmission line is decreased accordingly, to about 85 $\mu$ m. This is somewhat problematic, as a short length makes the transmission line more sensitive to fringe effects and more difficult to model reliably. The output is further isolated from the tank by the means of a buffer circuit, not shown in the schematic. The layout of the negative resistance VCO circuit is shown in fig. 4.15. Two versions of the negative resistance VCO circuit were made, shown in figs. 4.16 and 4.17. The second version (fig. 4.17) is fitted with an additional static divider. The layout is obviously pad limited by the GSGSG probe required for biasing the circuit. The surplus area is covered by decoupling capacitors. The power and phase noise spectra of the negative resistance type VCO circuit are shown in figs. 4.18 & 4.19. The frequency of the negative resistance VCO circuit was a bit of a disappointment, as it is operating at the fringe (69.5-76.95 GHz) of the W-band (75-110 GHz). Phase noise (>80 dBc at 1 MHz over the tuning bandwidth), output power (>-10 dBm) and power consumption (115 mW) were closer to what the simulations predicted. The relatively short transmission line is partially to blame for the sluggish frequency, but not to the extent seen here. The design is fairly small, though some interconnects have lengths approaching the same order as the length of the transmission line. However, EM simulations (Momentum) of the entire layout with Figure 4.14: Schematic of the LC type VCO circuit. distributed ports, performed prior to tape-out, did not uncover such problems. The transistor model provided by Vitesse is fairly simplistic when connected as a B-CE diode in the reverse bias region. The effects of this were quite obvious when the B-CE diode was initially simulated to characterize the component. However, this (less than ideal) behavior does not seem to affect the Colpitt VCO circuit #1 much. The combination of the above factors seem insufficient to fully explain the discrepancy, and the matter has not yet been completely resolved. # 4.6 Colpitt VCO circuit #2 (coplanar waveguide) For a while, it was thought that the two VCO circuit designs presented in the previous sections were not working properly, having extremely low output power and dismal phase noise. It was eventually discovered that the perceived problems were due to an incorrect calibration of the W-band mixer during the measurement setup. The faulty measurements prompted a lengthy investigation into the properties of the VCO circuits, without discovering any obvious flaw in either design. DRC, LVS, extraction process and models were found to be sound, leaving insufficient decoupling of the Figure 4.15: Layout of the negative resistance VCO circuit. tank, parasitic tank losses<sup>10</sup>, or the transmission line modeling as a possible (though) unlikely sources of the problems. The fact that both circuit exhibited identical problems made the situation even more perplexing. The unexpected development forced an immediate VCO redesign. A Colpitt type VCO design using coplanar waveguides instead of transmission lines was made, partially based on [80, 81]. The schematic is shown in fig. 4.20. A more detailed schematic can be found in appendix C, see fig. C.2. The design is similar to the Colpitt VCO circuit #1, but it is extended with an AC-coupled output buffer. The extra amplifier was added to further isolate the tank from the output and thus eliminate a potential problem source. A couple of emitter followers level shift the signal to an appropriate level and drive a differential amplifier. The differential amplifier is then loaded by a cascode stage. The layout of the Colpitt VCO circuit #2 is shown in fig. 4.21. A photo of the Colpitt VCO circuit #2 is shown in fig. 4.22. The picture reveals two slightly different versions of the same VCO circuit, placed adjacently. Two versions with different center frequencies were made with the intent that one of them would have the 100 GHz target within its adjustable frequency range. The VCO circuit was processed by Vitesse free of charge, but it was necessary to make the design as compact as possible to fit into an otherwise unused space within the reticle. The two versions have common DC pads to minimize chip area. This placement saved some <sup>&</sup>lt;sup>10</sup>It was thought that parasitics or losses were possibly underestimated. Figure 4.16: Photo of the negative resistance VCO circuit. Figure 4.17: Photo of the negative resistance VCO circuit with an integrated static divider. Figure 4.18: Negative resistance VCO circuit (mixed) power spectrum. space, but also increased (doubled) the power consumption when one of the circuits is under test, because both circuits become powered simultaneously. The result was that the first circuit to be tested failed spontaneously during the test procedure, less than a minute after reaching nominal supply voltage. The dies were subsequently glued to a substrate using a heat conducting adhesive, which solved the problem. The mirrored layout also means that the two circuits cannot be measured simply by shifting the probes, or even by rotating the entire chuck. The probes have to be either switched or the voltage supplies reconnected if the die is rotated. Rotating one of the circuits $180^{\circ}$ , instead of mirroring, would have enabled testing both circuits Figure 4.19: Negative resistance VCO circuit (mixed) phase noise spectrum. using the same setup, but the chuck would have to be rotated in between. Reusing pads saves space but also creates new problems. The design rules changed between the tape-outs for Colpitt VCO circuit #1 and #2. The design rules were altered to improve yield. #2 has much smaller decoupling capacitors because the current density passing through the interconnecting VIAs was restricted to about 1/3 of the previous level. The power and phase noise spectra of the Colpitt VCO circuit #2 has already been presented in figs. 4.6 & 4.7. ## 4.7 VCO conclusion Three different VCO circuits have been designed and tested. A tight schedule and extensive manufacturing time has only allowed a single run of each design. The test results are presented in table 4.1. The output power is stated as both single-ended and differential, because all the VCO circuits are fully differential designs, but measurements have been single-ended. The differential output power is 3 dB higher. The supply voltage is stated as -5.0 V for all designs, but they operate up to about -4.0 $\rm V^{11}$ . Power consumption (but also output power) can be reduced by increasing the supply voltage<sup>12</sup>. Two of the VCO circuits, Colpitt #1 and #2, are performing close to target and could be perfected in another run. The performance of the negative resistance VCO circuit is more dubious. The area of the three circuits are clearly limited by the number of required pads. $<sup>^{11}</sup>V_{ss}$ is slowly decreased once the probes have been placed. The voltage level where the VCO circuits start to oscillate is very noticeable on the spectrum analyzer. <sup>&</sup>lt;sup>12</sup>The supply voltage, $V_{SS}$ , is negative. Figure 4.20: Schematic of the Colpitt VCO circuit #2 (coplanar). | Circuit | Colpitt 1 | Negative resistance | Colpitt 2 | |---------------------------------|-----------|---------------------|-----------| | Oscillation frequency (GHz) | 96.12 | 76.95 | 85.29 | | Tuning bandwidth (GHz) | 2.35 | 3.45 | 5.62 | | Single-ended output power (dBm) | -7.77 | -9.21 | -9.63 | | Differential output power (dBm) | -4.76 | -6.20 | -6.62 | | Phase noise (dBc @ 1 MHz) | -81.33 | -82.17 | -83.83 | | Supply voltage (V) | -5.0 | -5.0 | -5.0 | | Current consumption (mA) | 35 | 23 | 85 | | Power consumption (mW) | 175 | 115 | 425 | | Width $(\mu m)$ | 710 | 710 | 683 | | Height $(\mu m)$ | 496 | 496 | 500 | | Area (mm <sup>2</sup> ) | 0.352 | 0.352 | 0.342 | | Pad pitch (µm) | 150 | 150 | 150 | Table 4.1: Overview of the performance of the three different VCO circuits designed for the CDR circuit. Figure 4.21: Layout of the Colpitt VCO circuit #2 (coplanar). Figure 4.22: Photo of the Colpitt VCO circuit #2 (coplanar). # Chapter 5 # Phase Locked Loops The Phased Locked Loop (PLL) is a common regulating stage. It is used to adjust the phase of an internal signal to match the phase of an external reference signal. It is typically used in communication systems to synchronize a local oscillator [106]. In the context of the CDR circuit, the PLL is used to set the frequency and phase of the internal clock signal generated by the VCO circuit to match those of the incoming data signal [107, 108]. The architecture of a basic PLL is shown in fig. 5.1. It consist Figure 5.1: A basic PLL architecture. of a phase detector, a low-pass filter, a linear amplifier, a VCO and a 1:N divider. Neither the amplifier nor the divider are necessary for the implementation of a basic PLL. The purpose of the amplifier will become clear later in this chapter (as well as in chapter 6). The divider is used to represent clock frequency division, such as the 1:2 divider described in chapter 3. The PLL must operate according to given specifications. This means that the behaviour of the PLL must be predictable and a mathematical model must be found to express it. The PLL is a sampled, non-linear system, but it can be approximated as a continuous, linear, time invariant system if certain conditions are met. The analysis will be much simpler if these conditions are met, because S-domain analysis can be applied. The conditions are: - The phase detector is the only non-linear circuit in the PLL. The output of the phase detector could be assumed to be linear and proportional to the phase difference of the two input signals. This is a reasonable approximation when the PLL is in the locked state, and the average frequency difference between the two input signals is zero. - The phase detector is digital and sample the incoming data at discrete intervals. The entire PLL is therefore a discrete system, which is described using difference equations. This can be circumvented if the bandwidth of the PLL, $\omega_{3dB}$ , is much smaller than the sampling frequency, $\frac{1}{T}$ . In typical designs, the loop bandwidth is roughly one-tenth of the input frequency [107]. ### 5.1 Phase detector An ideal phase detector generates a signal, $v_{PD}$ , that is proportional to the phase difference, $\Delta \phi$ , between two signals: $$v_{PD} = K_{PD} \Delta \phi \tag{5.1}$$ $K_{PD}$ is the gain of the phase detector. The ideal characteristics is illustrated in fig. 5.2. In practice, the phase detector will exhibit a less than ideal behavior. $K_{PD}$ Figure 5.2: Characteristics of an ideal phase detector. will not remain constant (or even linear) for large $|\Delta\phi|$ . $K_{PD}$ may also depend on the amplitude and the duty cycle of the input signals. The shape of $V_{PD}$ depend on the type of phase detector. Some phase detectors generates a $v_{PD}$ in the form of a series of pulses, each pulse having a duty cycle proportional to the phase difference. In this case, it is the average value of $v_{PD}$ over a period that indicates the phase difference. Such a phase detector can be realised using a XOR-gate. This is shown in fig. 5.3 where $Out = In \oplus Ref$ . The waveforms reveal that Out has cyclic behaviour Figure 5.3: Waveforms for a XOR-type phase detector. $Out = In \oplus Ref$ . and changes as In & Ref moves in and out of phase. This is because Ref has a higher frequency as In. For the same reason, Out has an average value of zero over entire period. # 5.2 Low-pass filter The low-pass filter suppresses high-frequency components in the phase detector output, allowing the DC value to control the frequency of the VCO. There are many different types of low-pass filters. A charge-pump filter [107, 11, 10, 106] will be used later and will hence serve as a useful example. The schematic is shown in fig. 5.4. Figure 5.4: Passive low-pass filter based on the charge-pump principle. The output voltages can be expressed as: $$V_{out} = -\overline{V_{in}}g_{m} \frac{R_{C}\left(R_{S} + R_{C} + \frac{1}{sC_{S}}\right)}{2R_{C} + R_{S} + \frac{1}{sC_{S}}} - V_{in}g_{m} \frac{R_{C}\left(R_{S} + R_{C} + \frac{1}{sC_{S}}\right)}{2R_{C} + R_{S} + \frac{1}{sC_{S}}} \times \frac{R_{C}}{R_{C} + R_{S} + \frac{1}{sC_{S}}} = \frac{-g_{m}R_{C}}{2R_{C} + R_{S} + \frac{1}{sC_{S}}} \left[V_{in}\left(R_{S} + R_{C} + \frac{1}{sC_{S}}\right) + \overline{V_{in}}R_{C}\right]$$ $$\overline{V_{out}} = \frac{-g_{m}R_{C}}{2R_{C} + R_{S} + \frac{1}{sC_{S}}} \left[V_{in}\left(R_{S} + R_{C} + \frac{1}{sC_{S}}\right) + \overline{V_{in}}R_{C}\right]$$ $$(5.2)$$ The differential output voltage and transfer function can then be found: $$V_{out} - \overline{V_{out}} = \frac{-g_m R_C}{2R_C + R_S + \frac{1}{sC_S}} \left[ \overline{V_{in}} \left( R_S + R_C + \frac{1}{sC_S} \right) + V_{in} R_C \right] - \frac{-g_m R_C}{2R_C + R_S + \frac{1}{sC_S}} \left[ V_{in} \left( R_S + R_C + \frac{1}{sC_S} \right) + \overline{V_{in}} R_C \right] = \left( \overline{V_{in}} - V_{in} \right) \frac{-g_m R_C \left( R_S + \frac{1}{sC_S} \right)}{2R_C + R_S + \frac{1}{sC_S}} \Rightarrow$$ $$(5.4)$$ $$H_{LP}(s) = \frac{V_{out} - \overline{V_{out}}}{V_{in} - \overline{V_{in}}} = g_m R_C \frac{sR_S C_S + 1}{s(2R_C + R_S)C_S + 1}$$ (5.5) This can be recast in the form of a first order, passive lag-lead filter: $$H_{LP}(s) = K_{LP} \frac{s\tau_2 + 1}{s\tau_1 + 1} = K_{LP}F(s)$$ (5.6) Where: $$K_{LP} = g_m R_C \tag{5.7}$$ $$\tau_1 = (2R_C + R_S) C_S \tag{5.8}$$ $$\tau_2 = R_S C_S \tag{5.9}$$ The filter transfer function is generally denoted F(s). Here the charge pump factor, $K_{LP}$ , has been isolated from the passive filter, F(s). An active low-pass filter will also be introduced. It will help to demonstrate the theoretical differences between it and the previous passive filter. An active low-pass filter could be implemented using an operational amplifier based on gyrators, as shown in fig. 5.5. The transfer function is: Figure 5.5: Active low-pass filter. $$V_{out} - \overline{V_{out}} = -A\left(V_a - \overline{V_a}\right) \Rightarrow V_a - \overline{V_a} = -\frac{V_{out} - \overline{V_{out}}}{A}$$ (5.10) $$V_a - \overline{V_a} = V_{in} - \overline{V_{in}} - \frac{R_1 \left( V_{in} - \overline{V_{in}} + V_{out} - \overline{V_{out}} \right)}{R_1 + R_2 + \frac{1}{sC}} \Rightarrow \tag{5.11}$$ $$\left(V_{out} - \overline{V_{out}}\right) \left(\frac{1}{A} + \frac{R_1}{R_1 + R_2 + \frac{1}{sC}}\right) = \left(V_{in} - \overline{V_{in}}\right) \left(1 + \frac{R_1}{R_1 + R_2 + \frac{1}{sC}}\right) \Rightarrow$$ (5.12) $$H_{LP}\left(s\right) = \frac{V_{out} - \overline{V_{out}}}{V_{in} - \overline{V_{in}}} = A \frac{R_1 + R_2 + \frac{1}{sC} - R_1}{R_1 + R_2 + \frac{1}{sC} + AR_1} = A \frac{sR_2C + 1}{sR_2C + 1 + sR_1C\left(1 + A\right)} = F\left(s\right)$$ (5.13) This expression can be simplified for large values of A: $$\lim_{A \to \infty} H_{LP}(s) = \lim_{A \to \infty} A \frac{sR_2C + 1}{sR_2C + 1 + sR_1C(1 + A)} = \frac{sR_2C + 1}{sR_1C} = \frac{s\tau_2 + 1}{s\tau_1}$$ (5.14) where $$\tau_1 = R_1 C \tag{5.15}$$ $$\tau_2 = R_2 C \tag{5.16}$$ The active low-pass filter is also of the first order, but it has a pole at the origin. This is a key difference between an active and a passive filter. It will be demonstrated (in section 5.6) to have an impact on the PLL phase error. ## 5.3 Linear amplifier The output signal from the low-pass filter may be too weak a control signal to fully employ the dynamic frequency range of the VCO. An amplifier may then be inserted between the low-pass filter and the VCO. The amplifier is linear, which makes its transfer function quite simple: $$v_{out} = K_{LA}v_{in} \tag{5.17}$$ $$H_{LA}(s) = \frac{V_{out}(s)}{V_{in}(s)} = K_{LA}$$ $$(5.18)$$ ## 5.4 VCO A VCO has a free running angular frequency of $\omega_0 = 2\pi f_0$ . The angular frequency is regulated by a control voltage, $v_c$ : $$\omega_{out} = \omega_0 + K_{VCO}v_c \tag{5.19}$$ $K_{VCO}$ is the angular frequency gain of the VCO. The output voltage at a particular time can be found by integration of the angular frequency from $-\infty$ to t: $$v(t) = A\cos\left[\omega_0 t + K_{VCO} \int_{-\infty}^{t} v_c(t) dt\right]$$ (5.20) The phase of the VCO consists of two parts; the free running phase, $\omega_0 t$ and the excess phase, $\phi_{out}(t) = K_{VCO} \int v_c(t) dt$ . In PLL literature, the control voltage is usually considered the VCO input and the excess phase is the output. The output/input transfer function then becomes: $$\phi_{out}(t) = K_{VCO} \int v_c dt \Rightarrow$$ (5.21) $$\Phi_{out}\left(t\right) = K_{VCO} \frac{V_{c}\left(s\right)}{s} \Rightarrow \tag{5.22}$$ $$\frac{\Phi_{out}}{V_c}(s) = \frac{K_{VCO}}{s} \tag{5.23}$$ ### 5.5 Divider A PLL is used in many different applications, and some of these requires fractional clocks. A general 1:N divider has been inserted in the PLL loop for this purpose. The phase (and frequency) of the clock output is 1:N of the phase (and frequency) of the clock input. The transfer function of the divider is: $$H_{DIV}(s) = \frac{\omega_{out}}{\omega_{in}} = \frac{1}{N}$$ (5.24) A 1:2 divider is often used in CDR circuits. This is because some phase detector architectures require quadrature-phase clock signals that can be derived using a VCO with twice the output frequency in combination with a divider circuit. ### 5.6 PLL The transient response of a PLL requires a complicated analysis. The system in non-linear and difficult to deal with [109]. However, it is instructive to study the behaviour of the PLL in certain states. #### 5.6.1 Locked state The PLL is assumed to be in the locked state. This implies that the loop is stable and can be assumed to be linear. The situation is shown in fig. 5.6. The transfer Figure 5.6: Linear model of a PLL. 5.6. PLL 71 function is stated for each of the PLL blocks, cf. fig. 5.1. The model is intended to describe the overall transfer function of the PLL. Hence the phase detector is represented using a subtractor and an amplifier. The subtractor calculates the phase difference at the input and output: $$\Phi_e(s) = \Delta\Phi(s) = \Phi_{in}(s) - \Phi_{out}(s)$$ (5.25) #### 5.6.1.1 PLL transfer function The open-loop transfer function is found by removing the feed-back signal line: $$G(s) = \frac{\Phi_{out}(s)}{\Phi_{in}(s)} = \frac{\Phi_{in}(s) K_{PD} H_{LP}(s) K_{LA} \times \frac{K_{VCO}}{s} \times \frac{1}{N}}{\Phi_{in}(s)} = \frac{K_{PD} H_{LP}(s) K_{LA} K_{VCO}}{s} = \frac{K_{HLP}(s)}{s}$$ $$(5.26)$$ The quantity $K_{PD}K_{LA}K_{VCO}/N$ is referred to as the loop gain, K. Closing the loop yields an expression for $\Phi_{out}(s)$ : $$\Phi_{out}(s) = (\Phi_{in}(s) - \Phi_{out}(s)) G(s)$$ $$(5.27)$$ Rearranging the terms reveals the closed-loop transfer function: $$\Phi_{out}(s) = (\Phi_{in}(s) - \Phi_{out}(s)) G(s) \Rightarrow \qquad (5.28)$$ $$\Phi_{out}(s)(1+G(s)) = \Phi_{in}(s)G(s) \Rightarrow \qquad (5.29)$$ $$H(s) = \frac{\Phi_{out}(s)}{\Phi_{in}(s)} = \frac{G(s)}{1 + G(s)} = \frac{KH_{LP}(s)}{s + KH_{LP}(s)}$$ (5.30) The phase error transfer function is defined as: $$H_{e}(s) = \frac{\Phi_{e}(s)}{\Phi_{in}(s)} = \frac{\Phi_{in}(s) - \Phi_{out}(s)}{\Phi_{in}(s)} = 1 - H(s) = \frac{s}{s + KH_{LP}(s)}$$ (5.31) The transfer functions for the first order passive and active low-pass filters have previously been presented in eqs. 5.6 & 5.13. The PLL closed loop transfer functions when using the passive and active filters are: $$H_{P}(s) = \frac{KH_{LP}(s)}{s + KH_{LP}(s)} = \frac{KK_{LP}\left(\frac{s\tau_{2}+1}{s\tau_{1}+1}\right)}{s + KK_{LP}\left(\frac{s\tau_{2}+1}{s\tau_{1}+1}\right)} = \frac{s\tau_{2}+1}{s^{2}\frac{\tau_{1}}{KK_{LP}} + s\frac{1}{KK_{LP}} + s\tau_{2}+1}$$ (5.32) $$H_A(s) = \frac{KH_{LP}(s)}{s + KH_{LP}(s)} = \frac{K\left(\frac{s\tau_2 + 1}{s\tau_1}\right)}{s + K\left(\frac{s\tau_2 + 1}{s\tau_1}\right)} = \frac{s\tau_2 + 1}{s^2\frac{\tau_1}{K} + s\tau_2 + 1}$$ (5.33) | Passive filter | Active filter | |------------------------------------------------------------------|--------------------------------------| | $\tau_1 = (2R_C + R_S) C$ | $\tau_1 = R_1 C$ | | $\tau_2 = R_S C$ | $\tau_2 = R_2 C$ | | $\omega_n = \sqrt{\frac{K}{\tau_1}}$ | $\omega_n = \sqrt{\frac{K}{\tau_1}}$ | | $\zeta = \frac{\omega_n}{2} \left( \tau_2 + \frac{1}{K} \right)$ | $\zeta = \frac{\tau_2 \omega_n}{2}$ | Table 5.1: Parameters for a second order PLL with passive and active filters. The transfer functions are those of low-pass filters, suggesting that if the input phase varies slowly, then the output excess phase follows, and conversely, if the input excess phase varies rapidly, the output excess phase variation will be small. In particular if $s \to \infty$ then $H(s) \to 1$ ; i.e. a static phase shift at the input is transferred to the output unchanged. The next step is to recast the PLL closed-loop transfer function into a familiar form used in control theory for filters [106, 10, 11]<sup>1</sup>. The transfer functions can be rewritten as: $$H_P(s) = \frac{s\left(2\zeta\omega_n - \omega_n^2/K\right) + \omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^2}$$ (5.34) $$H_A(s) = \frac{2\zeta\omega_n s + \omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^2}$$ (5.35) $\zeta$ is denoted the damping factor and $\omega_n$ is the natural frequency of the filter. They can both be expressed in terms of filter parameters $(\tau_1 \& \tau_2)$ for the passive and active low-pass filters. The terms are summarised in table 5.1. The dampening factor is a function of the loop gain. This means that the two parameters cannot be chosen independently. The highest power of s in the denominator of the PLL transfer functions is 2 and the loop is therefore denoted a second-order loop. This form of second-order loop is widely applied because of its simplicity and good performance [106, 107]. The amplitude of the transfer function is shown in fig 5.7 for several values of $\zeta$ . A large loop gain is assumed $(K\tau_2\gg 1)$ because it would simplify the expression $(H_P(s)\approx H_A(s))$ . The graph shows that the PLL acts as a low-pass filter on the incoming phase, $\Phi_{in}$ . All of the curves appear to cross the 0 dB threshold at the same $\omega$ , independent of $\zeta$ . They do indeed: $$|H(s)| = \left| \frac{2\zeta\omega_n s + \omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^2} \right| = 1 \Rightarrow \tag{5.36}$$ $$\left| 2\zeta\omega_n s + \omega_n^2 \right| = \left| s^2 + 2\zeta\omega_n s + \omega_n^2 \right| \Rightarrow \tag{5.37}$$ $$\left|\omega_n^2\right| = \left|-\omega^2 + \omega_n^2\right| \Rightarrow \tag{5.38}$$ $$\omega^2 = 2\omega_n^2 \Rightarrow \tag{5.39}$$ <sup>&</sup>lt;sup>1</sup>The PLL transfer function in [106] is correct, whereas the transfer functions (for the same type of filter) in [11, 10] are not. An active filter is mistakenly assumed in [11, 10]. The active and passive transfer functions are nearly the same if $K\tau_2 \gg 1$ in the passive filter. 5.6. PLL 73 Figure 5.7: PLL frequency response; $|H(s)|^2$ . $$\omega = \sqrt{2}\omega_n \tag{5.40}$$ which is indeed independent of $\zeta$ . The 3 dB bandwidth is found in a similar manner, by setting $|H\left(s\right)|^{2}=0.5$ : $$\omega_{3dB} = \omega_n \left[ 2\zeta^2 + 1 + \sqrt{(2\zeta^2 + 1)^2 + 1} \right]^{0.5}$$ (5.41) The 3 dB bandwidth is generally not of much practical use [106, 107], but it is interesting to notice that $\omega_{3dB} \propto \omega_n$ . #### 5.6.1.2 PLL error transfer function The next step is to look at the phase error in the locked state. The phase error transfer function is defined as: $$H_e(s) = (\Phi_{in}(s) - \Phi_{out}(s)) / \Phi_{in}(s) = 1 - H(s)$$ (5.42) This error function for the passive and active filters are found by inserting the appropriate H(s): $$H_{e,P}(s) = 1 - H_P(s) = 1 - \frac{s(2\zeta\omega_n - \omega_n^2/K) + \omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^2} = \frac{s^2 + s\omega_n^2/K}{s^2 + 2\zeta\omega_n s + \omega_n^2} = \frac{s(s+1/\tau_1)}{s^2 + 2\zeta\omega_n s + \omega_n^2}$$ (5.43) $$H_{e,A}(s) = 1 - H_A(s) = 1 - \frac{2\zeta\omega_n s + \omega_n^2}{s^2 + 2\zeta\omega_n s + \omega_n^2} = \frac{s^2}{s^2 + 2\zeta\omega_n s + \omega_n^2}$$ (5.44) Again, this can be simplified if the loop gain is large: $$H_{e,P}(s) \approx H_{e,A}(s) = \frac{s^2}{s^2 + 2\zeta\omega_n s + \omega_n^2} \cong H_e(s)$$ (5.45) The phase error is: $$\Phi_e(s) = \Phi_{in}(s) - \Phi_{out}(s) = H_e(s)\Phi_{in}(s)$$ (5.46) The amplitude for the phase error frequency response is shown in fig. 5.8. The Figure 5.8: PLL error response; $|H_e(s)|^2$ . frequency response has the characteristics of a high-pass filter, indicating that the PLL suppresses or tracks low-frequency changes, while high-frequency changes are ignored. #### 5.6.1.3 PLL error response The PLL will react to changes in the input signal. It is important that the PLL is able to recover its locked state when confronted with changes in the input signal. This is the purpose for evaluating how the PLL will respond to a step change in input phase or frequency. The Heaviside step-function is denoted $H(t)^2$ . The phase shift is represented in the time and s domain as: <sup>&</sup>lt;sup>2</sup>Not to be confused with the transfer function, H(s). 5.6. PLL 75 $$u_{\phi}(t) = H(t) \Delta \phi \Longleftrightarrow \frac{\Delta \phi}{s} = U_{\phi}(s)$$ (5.47) A frequency shift is represented as: $$u_{\omega}(t) = H(t) \Delta \omega \Longleftrightarrow \frac{\Delta \omega}{s^2} = U_{\omega}(s)$$ (5.48) Frequency is the time derivative of phase. The response of the PLL is: $$Y(s) = H(s)U(s)$$ $$(5.49)$$ Solving in the time domain can be tricky, but it is simpler to analyse the steadystate error remaining after any transients have died away. The final value theorem states that: $$\lim_{t \to \infty} y(t) = \lim_{s \to 0} sY(s) \tag{5.50}$$ This implies that the long time behaviour is determined by low frequencies or poles close to or on the real s-axis. Application of the final value theorem to the passive filter yields: $$\lim_{t \to \infty} y(t) = \lim_{s \to o} s H_{e,P}(s) U_{\phi}(s) = \lim_{s \to o} s \times \frac{s(s+1/\tau_1)}{s^2 + 2\zeta\omega_n s + \omega_n^2} \times \frac{\Delta\phi}{s} = \lim_{s \to o} \frac{\Delta\phi(s^2 + s/\tau_1)}{s^2 + 2\zeta\omega_n s + \omega_n^2} = 0$$ $$(5.51)$$ The steady state error response to a step change in phase shows that the PLL will (eventually) recover. The conclusion is that any phase change in the output will be transferred to the output; which is to be expected as this is the purpose of the PLL. The steady state error response to a step change in frequency is: $$\lim_{t \to \infty} y(t) = \lim_{s \to o} s H_{e,P}(s) U_{\omega}(s) = \lim_{s \to o} s \times \frac{s(s+1/\tau_1)}{s^2 + 2\zeta\omega_n s + \omega_n^2} \times \frac{\Delta\omega}{s^2} = \lim_{s \to o} \frac{\Delta\omega(s+1/\tau_1)}{s^2 + 2\zeta\omega_n s + \omega_n^2} = \frac{\Delta\omega}{\tau_1\omega_n^2} = \frac{\Delta\omega}{K}$$ $$(5.52)$$ This is a somewhat problematic but also predictable result. The VCO frequency will change to match the frequency of the input signal, but a small, fixed phase shift will occur. The phase detector/decision circuit sampling operation demands that bits should be captured in the middle of the period to secure optimum performance. A change in frequency will make less reliable samples and obviously affect the BER. The size of the phase shift is to expected, for a change in VCO control voltage ( $\Delta \omega/K_{VCO}$ ) is required to change the VCO frequency, and the control voltage comes from the phase detector ( $\Delta \omega N/K_{PD}K_{LA}K_{VCO}$ ) and can only be achieved by a static phase mismatch. The steady-state error response for the active filter to a change in phase is: $$\lim_{t\to\infty}y\left(t\right)=\lim_{s\to o}sH_{e,A}\left(s\right)U_{\phi}\left(s\right)=\lim_{s\to o}s\times\frac{s^{2}}{s^{2}+2\zeta\omega_{n}s+\omega_{n}^{2}}\times\frac{\Delta\phi}{s}=$$ $$\lim_{s \to o} \frac{s^2 \Delta \phi}{s^2 + 2\zeta \omega_n s + \omega_n^2} = 0 \tag{5.53}$$ Like the passive filter, the active filter (eventually) corrects any change in phase. The same response for a change in frequency is: $$\lim_{t \to \infty} y(t) = \lim_{s \to o} s H_{e,A}(s) U_{\omega}(s) = \lim_{s \to o} s \times \frac{s^2}{s^2 + 2\zeta\omega_n s + \omega_n^2} \times \frac{\Delta\omega}{s^2} = \lim_{s \to o} \frac{s\Delta\omega}{s^2 + 2\zeta\omega_n s + \omega_n^2} = 0$$ $$(5.54)$$ This reveals that the active filter will (eventually) change both its frequency and phase in response to a change in either. This is the chief virtue of the active filter. It should be noted that the passive filter is an approximation of the active filter for large values of K, cf. eq. 5.52. The error response in the time-domain can be found by calculating the inverse Laplace transformation of the error response in the s-domain. $\zeta < 1$ will be assumed for the transformation to be valid. The error response is: $$Y\left(s\right) = H_{e,A}\left(s\right)U_{\phi}\left(s\right) = \frac{s^{2}}{s^{2} + 2\zeta\omega_{n}s + \omega_{n}^{2}} \times \frac{\Delta\phi}{s} = \frac{s\Delta\phi}{s^{2} + 2\zeta\omega_{n}s + \omega_{n}^{2}} \Rightarrow (5.55)$$ $$\mathcal{L}^{-1}\left(Y\left(s\right)\right) = \mathcal{L}^{-1}\left(\frac{s\Delta\phi}{s^2 + 2\zeta\omega_n s + \omega_n^2}\right) = \Delta\phi\mathcal{L}^{-1}\left(\frac{s}{s^2 + 2\zeta\omega_n s + \omega_n^2}\right) = 0$$ $$\Delta\phi\mathcal{L}^{-1}\left(\frac{s+\zeta\omega_n}{\left(s+\zeta\omega_n\right)^2+\left(1-\zeta^2\right)\omega_n^2}-\frac{\zeta}{\sqrt{1-\zeta^2}}\times\frac{\sqrt{1-\zeta^2}\omega_n}{\left(s+\zeta\omega_n\right)^2+\left(1-\zeta^2\right)\omega_n^2}\right)=$$ $$\Delta\phi H\left(t\right)\left(e^{-\zeta\omega_{n}t}\cos\left(\sqrt{1-\zeta^{2}}\omega_{n}t\right)-\frac{\zeta}{\sqrt{1-\zeta^{2}}}\times e^{-\zeta\omega_{n}t}\sin\left(\sqrt{1-\zeta^{2}}\omega_{n}t\right)\right)=$$ $$\Delta\phi H\left(t\right) \left[\cos\left(\sqrt{1-\zeta^{2}}\omega_{n}t\right) - \frac{\zeta}{\sqrt{1-\zeta^{2}}}\sin\left(\sqrt{1-\zeta^{2}}\omega_{n}t\right)\right]e^{-\zeta\omega_{n}t}$$ (5.56) The Heaviside step function, H(t), can be ignored if t>0 is assumed. This is reasonable because the error response is obviously 0 for t<0 where $u_{\phi}(t)=0$ , so only t>0 is of any interest. The t>0 condition is implied in the remainder of the chapter. The error response can be calculated for other values of $\zeta$ in a similar manner. The phase error response for a phase step change is shown in fig. 5.9. It has the form of two overlaying and exponentially decaying waves. The frequency step error response is calculated in a similar manner, but is a bit simpler because it does not require factorisation. The results are presented in table 5.2 [106]. The phase removed, based on the assumption that t > 0. Table 5.2: Transient phase error, $\phi_e(t)$ , for second order PLLs for phase and frequency step changes in input. K is assumed to be large. The factor H(t) has been Figure 5.9: PLL phase error, $\phi_e(t)$ , in response to a step change in phase, $\Delta\phi$ . error response for a frequency step change is shown in fig. 5.10. It obviously start with $\phi_e(0) = 0$ . The phase error then accumulates before the filter responds and adapts. The phase error will eventually disappear, cf. eq. 5.54. The active filter has two important advantages over passive filters: - 1. The capture range is only limited by the VCO output frequency range. The phase detector+filter effectively has an infinite gain. - 2. The static phase error is zero if mismatches and offsets are negligible. ## 5.6.2 Tracking and acquisition The PLL also has two dynamic properties; tracking and acquisition. Tracking describes how the PLL is able to maintain its locked stated by adjusting to any changes in the input frequency and phase. Acquisition is the process of the PLL going from the non-locked state to the locked state. A detailed analysis is beyond the scope of this thesis. More information can be found in e.g. [110, 111]. It should be mentioned that many PLLs are fitted with both frequency and phase detectors in order to have a sufficiently wide tracking and acquisition frequency band. The frequency detector takes over when the PLL has lost its locked state. The frequency detector adjusts the frequency of the VCO into a narrow band, close to the frequency of the incoming data. Then the phase detector takes over and brings the PLL to the locked state. The PLL may also repeatedly loose its locked state by a change in channel or absence of incoming data. The PLL must then begin to acquire its lock again. Almost all experimental circuits lack integrated frequency detectors but rely on external (manual) frequency adjustments, e.g. [12, 7, 2, 5, 6, 3, 4]. A rare exception is possibly provided by [8]. 5.6. PLL 79 Figure 5.10: PLL phase error, $\phi_e(t)$ , in response to a step change in frequency, $\Delta\omega$ . Again, most communication systems use interfaces with extremely well defined frequencies. The tracking and acquisition range of a manufactured circuit is victim of process variation and cannot be equally well defined. The solution is to make the PLL frequency band wide enough to encompass the interface frequencies while allowing for the full range of process variation. # Chapter 6 # Clock and data recovery The CDR circuit is the most complex of the transceiver circuits and it contains a combination of analog and digital building blocks. The CDR circuit captures and resynchronizes the incoming data, providing coherent data & clock signals for subsequent demultiplexing or other processing. The circuit consists of a D flip-flop for data capture and a variable oscillator providing the clock signal. The CDR circuit is required to produce a valid clock output signal even in the event of a long string of Consecutive Identical Digits (CID) [112]. There are two fundamentally different architectures; the filter (or ringing tank) and the PLL. The filter design is based on a ringing tank excited by a specific frequency component present in the spectrum of the incoming data signal, as shown in fig. 6.1. Figure 6.1: Architecture of filter type CDR. The induced clock signal latches the incoming data, and the clock frequency must therefore match the incoming data rate. The tank is designed to resonate at the desired frequency and the data signal is required to contain such a spectral component. This reveals an important shortcoming of the ringing tank; it is incapable of generating a clock signal independent of the incoming data. Once excited, the filter will continue to produce a clock signal in the absence of data transitions, but the filter will eventually ring down and the clock signal cease. A differential amplifier can be used to improve clock amplitude & shape and thus extend the time it takes for the filter signal to become useless, but eventually the outcome is the same. The filter itself can be improved by increasing the Q, thus slowing the decay. However, an improved Q is more demanding on center frequency accuracy because the center frequency has to closely match the frequency of the incoming data<sup>1</sup> or oscillation will fail The Discrete Fourier Transformation (DFT)/Fast Fourier Transformation (FFT) power spectrum of an ideal random binary pattern at 100 Gb/s is shown in fig. 6.2 for both Return-to-Zero (RZ) and No-Return-to-Zero (NRZ) encoding (of the same signal) [113]. The graphs shows that the spectrum depends on the data source and Figure 6.2: Spectra of random RZ and NRZ signals. choice of line code. A line code can also be used to avoid extended CID, at the cost of some bandwidth. The NRZ spectrum is widely dispersed, almost white, whereas the RZ spectrum has a prominent component at the data rate. The bit pattern has in both cases been balanced to avoid a DC component<sup>2</sup>. The example demonstrates that the spectrum can be manipulated by coding schemes. The RZ spectrum has the desired frequency component (at the data rate), whereas the NRZ spectrum has not. A non-linear circuit, denoted as " $X^2$ " in fig. 6.1, can process the NRZ signal to create the required frequency component. The circuit could be implemented as a simple XOR gate, effectively detecting the transitions in the NRZ encoding, as demonstrated in fig. 6.3. The result is that every transition in the data stream generates a pulse with a period corresponding to the desired frequency. The delay element provides a fixed time delay and is implemented either as a transmission line <sup>&</sup>lt;sup>1</sup>Data frequencies are very narrowly defined by communication protocols. The frequency matching problem is thus confined to the filter itself, which has to match the frequency specified by the protocols. The filter requires high Q (slow decay), small process variation (predictable and consistent center frequency) and small temperature dependence (to avoid fluctuations in center frequency), which poses a serious challenge. Any deviation in center frequency will result in static phase errors. Static phase errors must always be compensated for in the retiming path, but temperature instability of the static phase is more difficult to accommodate. <sup>&</sup>lt;sup>2</sup>The bit pattern has a zero average. Figure 6.3: A non-linear element implementation based on a XOR gate and a fixed time delay, and the corresponding waveforms. Figure 6.4: A XOR gate implementation based on four NAND gates. (with an appropriate length) or as a series of simple gates (with a known delay). There are also numerous ways to implement a XOR gate. A design based on NAND gates is shown in fig. 6.4. This particular implementation is unlikely, but it remains instructive because it demonstrates a fundamental weakness with logical gates operating near the physical limitations. The signals do not progress symmetrically through the XOR gate. The asymmetric signal delay will result in an output signal with spikes and a skewed duty cycle, even when assuming ideal connections (with zero delay) between the NAND gates. Delay elements could be inserted between the NAND gates to assure transitions will reach the NAND gates at the same time, but the NAND gates themselves are inherently asymmetric. The filter architecture is simple to implement, can operate at high speeds and exhibits fast phase locking. The principal drawback are the tremendous difficulties involved in integrating the required high-Q filter on a chip. The filter architecture also suffers from temperature, process and frequency dependencies, and requires a process with high $f_t$ . The PLL is simpler to integrate because it does not need a high-Q filter. It also has spectral independence<sup>3</sup> due to the presence of an integrated clock (VCO). The architecture is shown in fig. 6.5. However, the low-pass filter required to crop the phase detector (PD) signal may require large passives, and the PLL generally exhibits far slower phase lock compared to the filter<sup>4</sup>. The PLL also suffers from temperature, process and frequency dependencies similar to the ringing tank, Integration is the key issue at 40 Gb/s and beyond. The PLL architecture is superior to the ringing tank because it offers the simplest integration. An additional benefit is that certain types of phase detectors in the PLL may be used to demultiplex <sup>&</sup>lt;sup>3</sup>Extended periods of CID has no effect on the amplitude of the clock signal, but cause the phase (and frequency) to wander. Each packet of incoming data must be preceded by a preamble of sufficient length, to ensure that the synchronization has been achieved before the valid data arrives. $<sup>^4</sup>$ Information has transpired from a 40Gb/s project at Alcatel. The minimum number of bits required to lock the phase is roughly 10 & 100 for a ringing tank and a PLL respectively. Thus, the difference is about an order of magnitude. Figure 6.5: Architecture of a PLL type CDR circuit. the data, lowering the output bandwidth requirements. It may also be added that (currently) most optical networks employ NRZ<sup>5</sup>, favoring the PLL architecture. ### 6.1 Architecture There are many variations of the PLL architecture, because each of the components can be realized in numerous ways. A PLL is roughly classified based on its type of phase detector. Phase detectors may be either linear, also known as Hogge-type, or non-linear, also known as binary, bang-bang or Alexander-type. Furthermore, the architecture may be either full-rate or half-rate, denoting the relation between the data sampling rate<sup>6</sup> and the data rate. The distribution of fast, synchronized data and clock signals is notoriously difficult to realize and it becomes even more complicated when an asymmetric (with respect to layout) design has to be realized. A symmetric schematic does not necessarily translate into a symmetric layout, when physical constraints are present. A full-rate system requires simpler circuits (less complexity equals fewer transistors, less power consumption, smaller chip area, improved yield and less cost), but has more broadband data and high-frequency clock signals present on-chip. Also, the circuits themselves have to operate at a higher bandwidth or frequency. A half-rate system appears easier to achieve when the transistor speed $(f_t, f_{max})$ is the limiting factor, but the architecture is more complex. An example of each architecture will demonstrate the differences, see figs. 6.6 & 6.7 for a comparison. The full-rate design has several high frequency clock (green) and broadband data (blue) signals and components, whereas the same routing problem in the half-rate design is limited to the initial distribution and latching of <sup>&</sup>lt;sup>5</sup>The choice of modulation has numerous implications and has to be selected based on its particular merits. RZ is typically used for underwater transmission and more complex modulation schemes are being employed to improve e.g. Inter Symbol Interference (ISI). <sup>&</sup>lt;sup>6</sup>The data sampling rate refers to the sampling rate of the latches. The incoming data may be distributed into multiple channels with latches sampling alternate bits. This reduces the sampling rate of the latches, but requires parallel latches to sample the entire signal (and thus more circuitry). The VCO itself may operate at a higher frequency than the sampling rate, with the VCO clock signal being divided prior to being distributed to the latches. 6.1. Architecture 85 Figure 6.6: Full-rate CDR circuit design with emphasised high-speed circuits and interconnects. Figure 6.7: Half-rate CDR circuit design with emphasised high-speed circuits and interconnects. the data signal. The full-rate design is based almost exclusively on very fast circuits, whereas these circuits are confined to the initial data capture in the half-rate design. The half-rate example also has the added benefit of producing two demultiplexed streams, simplifying the interface to subsequent processing. The clock division is necessary for the synchronization of the sampled data and would most likely be used by a subsequent demultiplexer. A review of articles relating to CDR circuits published in the last decade reveals that each new bandwidth milestone was first reached by a half-rate design, followed by a full-rate design about two years later [3, 5, 6, 4, 7, 10]. After that, more functions were added to the CDR circuits (such as 1:4 or 1:16 demultiplexing), further increasing complexity, and previously separate circuits were integrated on the same die. Both physics and history thus endorsed the choice of a safer, but more cumbersome, half-rate design. This would minimize the risk of not having working silicon (or rather InP) on the first run. Two different approaches were developed, based on the published designs, particularly [3, 2, 4]. The variations are shown in figs. 6.8 & 6.9. An amplifier has Figure 6.8: Half-rate CDR with Alexander type phase detector. Figure 6.9: Half-rate CDR with double Alexander type phase detector. been added between the data input and the phase detector, for reasons that will be explained in the following section. Both designs employ a half-rate architecture. The phase detector is an Alexander-type containing a 1:2 demultiplexer (DEMUX). This reduces the bandwidth of the recovered data to a more manageable level. The output data (50 Gb/s) and clock (25 GHz) signals are suitable as the input of a subsequent demultiplexer. The phase detectors are different. The Alexander-type phase detector in Fig. 6.8 uses a half-rate, differential clock whereas the double Alexander-type phase detector in fig. 6.9 requires a half-rate, four phase clock. The four phase clock is generated by the combination of a VCO and a static divider, see chapter 3. The full-rate clock is divided into two separate half-rate clock signals with a 90° phase shift between them. The details of the two CDR circuit designs will discussed in the following sections. ## 6.2 Receiver interface The incoming optical signal is carried by photons. The photo detector transforms the optical signal into an electrical current proportional to the power of the optical signal<sup>7</sup>. The optical power varies over time, due to a number of reasons. E.g., the time slots could be distributed between several sources that are not necessarily placed at the same distance (length of fiber) from the receiver. The difference in distance between source and receiver causes a difference in attenuation. There is not supposed to be a significant variation in amplitude during a single time slot. A TIA transforms the current signal into a proportional voltage signal. The result is that the voltage signal received by the CDR circuit follows the optical (power) signal and that the signal may vary greatly in amplitude over time. The solution is to insert a non-linear amplifier in the data path to reamplify and reshape the signal. The amplifier must give the signal a constant amplitude independent of the variable amplitude of the incoming signal, assuming that the signal level is not unreasonably weak. At least two solutions are possible; either a limiting amplifier (LA) or an automatic-gain-control (AGC) amplifier. The LA and AGC are typically of similar design, the AGC having an additional stage for gain control. A LA is chosen because it is somewhat simpler to realize, and in this case, an external signal for gain control is not required. The LA is also sufficient because the output is intended to have a fixed amplitude. The receiver data path contains a high-speed (100 Gb/s) electrical signal, which is very difficult to transmit even over short distances. Allocating the different functions in the transceiver block diagram, fig. 1.1, to separate chips (or even dies) would necessitate complicated and costly interfaces. It would be a much better solution to integrate as many of the functions as possible. The LA is easily integrated on the same chip as the CDR circuit because the LA is relatively small in size, has a relatively low power consumption, and can be realized using the same technology. Also, an unbuffered CDR circuit would require an amplifier anyway. The combined LA and CDR circuit is able to perform full 3R. The addition of a PD providing 1:2 demultiplexing further integrates the broadband building blocks of the O/E-interface <sup>&</sup>lt;sup>7</sup>The flux of electrons (current) is proportional to the flux of incoming photons. The power of the optical signal is also proportional to the flux of incoming photons because the photons all have the same energy (and wavelength). and simplifies clock and data output interfacing. # 6.3 Limiting amplifier The LA has to operate over a very wide bandwidth, and has been designed according to many of the principles described in [93]. A broadband amplifier in an optical network has to operate down to very low frequencies; about 30 kHz [93] in the optical fibers<sup>8</sup>. The amplifier must therefore be DC-coupled. The LA consists of a chain of cascaded transadmittance stages (TAS), transimpedance stages (TIS) and emitter followers, as shown in fig. 6.10. The input Figure 6.10: Broadband amplifier employing impedance mismatch between the stages. and output impedances of adjacent stages are usually matched to prevent reflections. The same is true for the intermediary transmission lines, if they are considered long. This methodology works very well for a limited bandwidth, because impedances remain fairly constant across narrow bands, but it is very difficult to achieve good matching over a wide spectrum. The resulting mismatch will cause the amplifier to exhibit a strongly frequency dependent behavior and have a low cutoff frequency. The solution is to employ the principle of strong impedance mismatch between adjacent stages. This would drastically reduce the influence of the likewise strongly frequency dependent input and output impedances. Adjacent stages are mismatched, but the degree of mismatch (quantified as the reflection and transmission of incident waves) will remain fairly constant up to very high frequencies, resulting in a high cutoff <sup>&</sup>lt;sup>8</sup>Burst mode transmission requires DC coupling to adjust to different signal levels between bursts. frequency. The mismatch also has the advantage that parasitic capacitance located between the stages will be connected to ground through a low impedance, either at the input or the output. This will reduce the influence of the parasitic capacitance and thus further increase the cutoff frequency. The emitter followers are added to the stage chain to improve the insufficient mismatch between the TIS and TAS stages. The mismatch principle does have some drawbacks, e.g. some cost to sensitivity. The schematic of the LA is shown in fig. 6.11. The LA input impedance is set Figure 6.11: Schematic of the limiting amplifier. The different stages are colored according to type, similar to the block diagram in fig. 6.10. to 50 $\Omega$ , matching the 50 $\Omega$ transmission lines to the probe. The LA demonstrates many of the features typical of broadband amplifiers: - Multiple emitter followers. The emitter followers perform level shifting as well as decoupling (impedance transformation) of the signal. The effective current gain of transistors decreases with frequency, reducing the decoupling. Thus two (or even three) cascaded emitter followers are often required. However, the improved decoupling comes at the expense of some loss to signal amplitude. The emitter followers also provide gain peaking near the upper frequency limit, extending the bandwidth. - Differential operation. This approach has several advantages. The two signal carriers are not only complementary with symmetrical transients, but are also closely spaced, reducing the effects of crosstalk and interference. The pulse edges are steeper, allowing for higher data rates, with improved eye diagrams. - Low voltage swing. Decreasing the resistance of the load resistors decreases the voltage swing but also gives steeper pulse edges. The voltage swing must be sufficient to ensure against signal loss. A voltage swing of 300 mV has been implemented for digital circuits throughout the design. - On-chip matching. The interface between the external signal generator and the LA has to be matched to avoid reflections. The return loss is improved by terminating the signal locally on-chip (close to the circuit), in this case at the first stage of emitter followers of the LA. The LA input termination and circuitry is placed as close as possible to the input pads to minimize the losses in the transmission line. - Maximising $f_t$ and $f_{max}$ . The $f_t$ and $f_{max}$ of VIP-2 has a strong dependency on the collector current density. $f_t$ and $f_{max}$ reach their global maxima at around 3.8 mA/ $\mu$ m<sup>2</sup>, as seen in fig. 2.20. Critical transistors are generally biased to operate around this point, but it does result in a significant current (and power) consumption. The heat generated by large circuits such as a CDR circuit can easily destroy the circuit if preventive and remediating measures are not taken. Less critical parts of a circuit could be more economical with the current density without any significant impact on performance. In some cases, such as current sources, $f_t$ and $f_{max}$ should actually be minimised. Stability is the hallmark of good sources and slower transistors are less susceptible to noise. The LA stages are distributed in three sections, separated by transmission lines. This distribution is primarily motivated by practical (layout) reasons, but it also helps by dividing the inevitable transmission line between the input pad and the LA into four smaller pieces. The phase detector requires the data signal to be distributed to two (or four) D flip-flops (depending on the choice of phase detector) placed in parallel, each consisting of two latches. The signal is split before the final TIS and emitter follower stages. These stages are also placed in parallel, adjacent to the phase detector. This requires the signal to be split at the egress of the preceding TAS stage. These transmission lines will then become fairly long (about 200 $\mu$ m for a 1:4 split), see fig. 6.12. The transmission lines at the split must be symmetric to avoid a relative phase shift. The AC-response of the 1:2 LA with dual output is shown in figs. 6.13 & 6.14. The AC-response is shown for both a terminated and an open-ended LA (i.e. without the 46 Ω load of the D latch). The terminated LA has a 3 dB bandwidth of 83.1 GHz and an amplification of 22.6 dB (19.6 dB plus an additional 3 dB because of the 1:2 signal split). The first stage of the LA has a cut-off frequency at 121 GHz. The group delay is shown in fig. 6.15. The group delay shows some peaking near the upper frequency limit, resulting in signal distortion. The performance of the LA was a compromise between amplification, bandwidth and linearity [114]. The bandwidth could be improved by some additional peaking near the upper frequency limit, but this also serves to increase the peaking of the group delay. The small signal AC characteristics are not very revealing of actual performance, because of the non-linear limiting characteristics. The final verification must be performed in the time-domain, by studying the large signal amplification. The amplified output signals for 25 mV, 50 mV & 100 mV differential input signals are shown in fig. 6.16. There is not much difference between the 50 mV & 100 mV signals. The small 25 mV input signal actually presents an improvement with regard to output signal shape, because saturation peaking is avoided. The LA remains nearly linear Figure 6.12: Layout of limiting amplifier (1:4 version). The stages are fairly compact, except where the signal is split 1:4. up till saturation, even for high amplitudes. This was achieved by optimizing each stage to be as linear as possible, up to this point. A differential PRBS at the desired bit rate can be complicated to generate and distribute, simply because the phase shift in the connectors can be difficult to match. A likely test scenario (similar to that of the static divider in section 3.5) will be to attempt a single ended input using one input pad, and setting a suitable DC voltage on the complementary pad. This scenario also figures in several of the published results [3, 2, 4, 7]. The amplified output signals for 50 mV, 100 mV & 200 mV singleended input signals (comparable in power to the differential amplitudes above) are shown in fig. 6.17. The complementary input pad is fixed at the DC level of the single-ended signal. The weak 50 mV signal does have a noticeable effect on the voltage swing of the LA output. The single-ended input signal is not as efficient as the differential, though the DC-level of the complementary input pad can be optimized to slightly improve the result. It would be possible to remove the fixed voltage input pad entirely if only a single-ended input is used, but an attempt to employ a differential PRBS generator [115] (also designed as part of the OptCom project, R3) should be made if possible. This PRBS generator has a variable bit rate and could be used to verify the locking range of the CDR circuit around 100 Gb/s<sup>9</sup>. Some free space was utilized on R1 to manufacture the LA, as shown in fig. 6.18. Unfortunately, R1 suffered from dismal yield, and we were unable to find any working $<sup>^9</sup>$ It has later transpired that the PRBS generator is only useful at frequencies up to about 80 Gb/s. The eye diagrams become too distorted at higher frequencies. It is therefore not possible to use it as a 100 Gb/s source. 6.4. Phase detector 91 Figure 6.13: Limiting amplifier amplitude characteristics. circuits. The problem was eventually identified as short circuit defects in the MIM-capacitors, used extensively on the chip to decouple the supply voltage. It is common practice to use any unused chip area for decoupling, and this is clearly shown in the photo. The photo is of the (obsolete) 1:4 version, used in the first version of the CDR circuit. The current 1:2 version has not been implemented as a separate chip. ## 6.4 Phase detector Two CDR circuits have been developed. The main difference lies in the choice of phase detectors. The first CDR circuit uses a double Alexander-type phase detector, whereas the second CDR circuit uses a Hogge-type phase detector. The Alexander-type is non-linear and indicates whether the clock is either early or late. The Hogge-type is linear and indicates how much the clock is either early or late. ## 6.4.1 Hogge-type phase detector (linear) A Hogge-type phase detector is shown in fig. 6.19. It is a linear half-rate detector that uses a half-rate VCO and the distribution of the data and clock signals is relatively straightforward. The half-rate phase detectors presented in [12, 5, 6, 7, 10, 11] employ the same principle. The linear phase detector has the architecture of a simple 1:2 DEMUX. This enables the phase detector to perform both functions. The data signal is split into two separate paths by latching the data signal at alternate clock flanks of the half-rate clock. The result is that the two ingress latches capture alternate bits, and both the Figure 6.14: Limiting amplifier phase characteristics. latched $(Q_1/Q_3)$ and the stored<sup>10</sup> $(Q_2/Q_4)$ data is accessible to the phase detector logic. Two XOR gates compare the latched and stored data and uses it to generate two control signals. The logic for the control signals can be stated as: $$E = Q_1 \oplus Q_3 \tag{6.1}$$ $$R = Q_2 \oplus Q_4 \tag{6.2}$$ The phase detector operation is illustrated in figs. 6.20 & 6.21 for early and late clock signals. The reference signal (R) is almost identical in the two cases, regardless whether Clk is early or late. Eq. 6.2 describes a comparison between two adjacent bits in the incoming data stream. A phase shift in Clk will only result in an identical phase shift of the stored data and thus an identical phase shift of R. It is important to notice that R is a reference signal that indicates a transition between two bits of reverse polarity. The absence of transitions will have an impact on R and the effect is highlighted in fig. 6.20. The transition (two consecutive bits are different<sup>11</sup>) frequency is 50% if a random sequence is assumed. The error signal (E) indicates the phase difference between Clk and Data. Like R, it generates a pulse whenever a transitions occurs. The phase difference between Clk and Data is given by the duty cycle of E. The pulses of E & R are highlighted in fig. 6.21. A duty cycle of 50% indicates that Clk and Data are synchronized. The duty cycle can ideally vary between 0% and 100% for early $(-90^{\circ})$ with respect to Clk) respectively late $(+90^{\circ})$ with respect to Clk) arrival of Clk. The duty cycle is 75% in fig. 6.21 indicating that Clk is $(+45^{\circ})$ late, and 25% in fig. 6.20 indicating that Clk is early. <sup>&</sup>lt;sup>10</sup>Meaning stored by the D flip-flop. <sup>&</sup>lt;sup>11</sup>The probability of the pattern ("...01..." or "...10...") being equal to the pattern ("...00..." or "...11..."). 6.4. Phase detector 93 Figure 6.15: Limiting amplifier group delay. The XOR gates require complementary data to operate, otherwise E & R will not give any indication of Clk phase. Complementary data occurs when two consecutive bits are different and a transition occurs. This makes the linear phase detector sensitive to the data pattern. Extended periods without transitions (CID) will give no useful feedback and eventually the VCO will wander outside the phase margin of the D latches. Synchronization will then have been lost and subsequent data becomes unreliable. The E & R signals are never invalid, nor are they always useful. The layout of the linear phase detector (without logic) is shown in fig 6.22. Data progresses from left to right. Clock is distributed through a point in the middle of the D latches, to ensure that it reaches all D latches without any phase difference. # 6.4.2 Alexander-type phase detector (non-linear) An Alexander-type phase detector is shown in fig. 6.23. The half-rate phase detectors presented in [1, 3, 2, 4, 8, 9] use the same principle. The core is similar to the Hogge-type phase detector described in the previous section. The data is likewise demultiplexed into two separate channels, but the top channel is only used for demultiplexing and is actually not a part of the phase detector itself. It is included to emphasize the topological similarity with the previous phase detector. The operational principle of the Alexander-type phase detector is to compare the data captured by the reference clock (Clk), with data captured by a clock with a $90^{0}$ phase shift (Clk90). The sampling operation is demonstrated in fig. 6.24. This requires a quadrature clock signal, making the design more complex. Two clocks have to be generated and distributed. Clk90 (as well as Clk) is generated by either a quadrature divider from a full-rate clock signal or as a delayed version of Clk. The comparison of the two bits can be performed by an XOR-gate: Figure 6.16: Limiting amplifier with differential signalling. | Pattern | D | T | Early $(0)/\text{Late}(1)$ | Probability | |---------|---|---|----------------------------|-------------| | 00 | 0 | 0 | 0 (False) | 25% | | 01 | 0 | 0 | 0 | 12.5% | | 01 | 0 | 1 | 1 | 12.5% | | 10 | 1 | 1 | 0 | 12.5% | | 10 | 1 | 0 | 1 | 12.5% | | 11 | 1 | 1 | 0 (False) | 25% | Table 6.1: Logical table for the Early/Late signal in an Alexander-type phase detector. $$E/L = D_2 \oplus T \tag{6.3}$$ The logical table of the comparison is shown in table 6.1. Notice that the phase detector is unable to detect a late Clk if two consecutive bits are identical, resulting in an erroneous E/L, indicating an early Clk, regardless of the actual Clk state. The operation of the Alexander-type phase detector is illustrated in figs. 6.25 & 6.26 for early and late clock signals. Both the reference data and the shifted data will be identical if Clk is early, because Clk90 will arrive early enough to latch the same data as Clk. Fig. 6.25 shows that the $D_2$ & T data streams are identical, except for a $90^0$ phase shift corresponding to the time when the data was latched. The XOR operation would not be very exciting, were it not for the $90^0$ phase shift of Clk90. The phase shift is mirrored in the shifted data, and the result is that small $(90^0)$ pulses occur in the event of a transition. Again, the probability of a transition is 50% assuming a random sequence. 6.4. Phase detector 95 Figure 6.17: Limiting amplifier with single-ended signalling. Clk and Clk90 will latch adjacent bits if Clk is late. The probability that the bits are different is 50% (assuming a random sequence). This is picked up by the XOR operation. The pulse width is elongated $90^{\circ}$ due to the phase shift of Clk90. The comparison of the reference and shifted data for both early and late clocks requires that adjacent bits are different, otherwise the comparison will be pointless. This occurs 50% of the time if the data is random, but the pattern dependency means that CID is a definite threat. The $90^0$ phase shift between $D_2$ & T can be removed. This is done by adding a Clk controlled latch in series with the reference and shifted data. This will be demonstrated in the next section. # 6.4.3 Double Alexander-type phase detector (non-linear) A more robust phase detector can be constructed by combining two phase detectors and thus make twice the number of comparisons. This is the strategy pursued in [3, 2, 4, 8]. The architecture can be referred to as a double Alexander-type phase detector (2A). The schematic is shown in fig. 6.27. The single Alexander-type (1A) phase detector distributes the incoming data into two separate channels but only one of these channels (D) is examined and compared to a reference (T). The 2A phase detector operates on both channels simultaneously. This increases the phase detector gain, because useful comparisons can now be made 75% of the time (assuming a random bit pattern). The logic performed by the 2A phase detector is a bit more complex in the [3, 2, 4, 8] implementations. It compares two adjacent bits<sup>12</sup> with an intermediary <sup>&</sup>lt;sup>12</sup> Adjacent in the original data stream, not the demultiplexed streams. The phase detector logic uses data from both of the demultiplexed streams simultaneously to compare bits that are adjacent Figure 6.18: Photo of the limiting amplifier chip (1:4 version). Figure 6.19: Linear phase detector schematic. sample. The sampling operation is demonstrated in fig. 6.28. The logical table of the comparison is shown in table 6.2. The advantage of comparing the intermediary sample, T, with the two adjacent bits is that there are no false Early/Late-flags, as is the case for the 1A phase detector. The effect is a reduction in jitter generation on the VCO control signal, cf. section A.3. The logic table also explains why two of the four channels in fig. 6.27 use both of the clock signals. This is because the data in the last D latches (T) has to be synchronized with the rest of the data (C & D) to make a simultaneous comparison. The logical expressions of the full 2A phase detector can be stated as: $$E = \overline{(D_1 \oplus T_1)} \times (C_1 \oplus T_1) + \overline{(D_2 \oplus T_2)} \times (C_2 \oplus T_2)$$ $$\tag{6.4}$$ in the incoming stream, cf. fig. 6.27. 6.4. Phase detector 97 Figure 6.20: Linear phase detector operation as a function of the incoming data. Clk is early. | Pattern | D | T | C | Early | Late | Probability | |---------|---|---|---|-------|------|-------------| | 00 | 0 | 0 | 0 | 0 | 0 | 25% | | 00 | 0 | 1 | 0 | 0 | 0 | 0% | | 01 | 0 | 0 | 1 | 1 | 0 | 12.5% | | 01 | 0 | 1 | 1 | 0 | 1 | 12.5% | | 10 | 1 | 0 | 0 | 0 | 1 | 12.5% | | 10 | 1 | 1 | 0 | 1 | 0 | 12.5% | | 11 | 1 | 0 | 1 | 0 | 0 | 0% | | 11 | 1 | 1 | 1 | 0 | 0 | 25% | Table 6.2: Logical table for Early and Late signals in a double Alexander-type phase detector. $$L = (D_1 \oplus T_1) \times \overline{(C_1 \oplus T_1)} + (D_2 \oplus T_2) \times \overline{(C_2 \oplus T_2)}$$ $$(6.5)$$ All of the XOR-factors appear in both the E and the L expressions. Reusing the same factors in both computations allows for the reduction in circuit complexity. This is demonstrated in the 2A schematic, fig. 6.27, where the same XOR gates are used to produce both of the control signals. There are no inverters in an actual design. All of the signals are differential and an inverted signal is accomplished merely by switching the two signal lines. The operation of the 2A phase detector is illustrated in figs. 6.29 & 6.30 for early and late clock signals. The phase detector determines the early or late state very well, except when there are more than two consecutive identical bits. Such an event (and the result thereof) has been highlighted in both figs. On average, this only occurs 25% of the time, assuming random data. The phase detector will thus return Figure 6.21: Linear phase detector operation as a function of the incoming data. Clk is late. a valid response during the remaining 75% of the time. The layout of the 2A phase detector (without logic) is shown in fig 6.31. Data progresses from left to right. The size of the phase detector makes it difficult to achieve a good clock distribution, i.e. a clock signal that has the correct phase at all of the latches. To make matters even more complicated, a quadrature clock is employed, resulting in four parallel clock lines. The clock load should also be symmetric, which just happens to be the case in fig. 6.27. A closer look reveals that two of the D latches (at the bottom) have no bearing on the operation of the phase detector. They are connected as a T flip-flop, similar to the static divider in chapter 3. The clock frequency of the distributed clock in the half-rate architecture is only 50 GHz and the output from the T flip-flop is a 25 GHz clock that is synchronized with the demultiplexed data (50 Gb/s). The purpose of this synchronized clock is to facilitate further demultiplexing. Adding this T flip-flop to the clock networks is not quite enough to provide an even load. A further two (dummy) D latches are added (at the top) to make the load fully symmetric. In conclusion, the 2A phase detector has more gain, because it makes twice the comparisons. It also has superior jitter performance, because it has been fitted with a more robust logic, that does make erroneous decisions, The major drawback is the increased complexity and problems derived from this complexity (layout problems, die size, power consumption, heat etc.). Like the Hogge-type and the 1A phase detectors, the 2A isn't immune to CID either. # 6.4.4 D flip-flops A number of D latches are used to realize the D flip-flops of the phase detector. Two D latches are placed adjacent to each other and connected together to form a D flip-flop, as shown in fig. 6.32. The schematic of the D latches is shown in fig. 6.33. The Figure 6.22: Layout of the linear phase detector. respect to sizing and layout. the static divider; cf. D latches in the CDR circuit are topologically identical to the the D latches used in section 3.2. However, there are a number of differences with and also because the logic of the phase detector requires very well defined signals the first latch is critical, both because of the higher bandwidth of the input signal second latch merely latches the $50 \; \mathrm{Gb/s}$ data from the first latch. The operation at a D flip-flop has to latch alternate bits of the full, 100 Gb/s data stream, while the (with steep flanks) to operate properly $^{13}$ D flip-flops latching alternate bits of the incoming data stream. The CDR circuit is based on a half-rate architecture. This involves two parallel The first D latch of differ with respect to the choice of phase detector and the consequences of that choice, Two CDR circuits have been designed. Their architectures are similar but they <sup>&</sup>lt;sup>13</sup>In the CDR circuit, the phase detector is actually a more critical component than the D latch. Figure 6.23: A non-linear phase detector schematic. Figure 6.24: Single, non-linear phase detector operation using two samples. cf. section 6.4. The first version employs a quadrature phase detector, whereas the second version uses a simpler differential phase detector. The first version of the CDR circuit requires eight D flip-flops, each with a considerable current/power consumption. The primary $100~{\rm Gb/s}$ half-rate D latches located directly at the $100~{\rm Gb/s}$ data interface are the most critical circuits, whereas the secondary, subsequent latches are less so. The power consumption is limited by reducing bias currents and resizing transistors in the subsequent latches, which can be done without unduly affecting performance. The second version of the CDR circuit merely requires two D flip-flops, and thus significantly less power. Still, the secondary latches could use current more niggardly without any noticeable loss of latch performance. The problem with this tactic is that the phase detector in the second version of the CDR circuit (Hogge-type) is considerably more dependent on the shape and latency of the latch output signals than the first version (Alexander-type). Because of this, both latches in the flip-flops of the second version are of the same, fast type, emphasizing speed. The D latch schematic, fig. 6.33, shows that inductive peaking is employed. This is only true of the the primary latches in both of the CDR circuits. The reason for this choice is purely practical. The M4 microstrip takes too much space and cannot be placed without interfering with the clock signal distribution. There are numerous ways to reduce power consumption in latches at the cost of performance, both in topology (e.g. no level shifting/poor biasing) and implementation (e.g. asymmetric write/hold). The performance required by a 100 Gb/s CDR 6.4. Phase detector 101 Figure 6.25: Non-linear phase detector operation as a function of the incoming data. Clk is early. Figure 6.26: Non-linear phase detector operation as a function of the incoming data. Clk is late. circuit precludes most of these techniques from being implemented, and thus from being analysed here. The transient response of a D flip-flop with a 100 Gb/s PRBS input signal is shown in fig. 6.34. The clock is half-rate, thus resulting in alternate bits being latched. The output is shown for both D latches, and the second D latch can be seen trailing the first D latch with 10 ps (equivalent to a clock phase shift of 180°) as expected. The first D latch lets through some spikes corresponding to the bits adjacent to the latched bits, but these spikes are eliminated in the second latch. There is some obvious switching noise as well, relating to the clock signal. The eye diagrams for the same signals, shown in fig. 6.35, are very pleasing when clock and data are in phase. Both the first and the second D latches exhibit a very wide and clearly defined opening. Simulations indicate that the D flip-flops are capable of capturing data in excess of $150\,\mathrm{Gb/s}$ , but the extracted parasitics are definitely a bit on the pessimistic side. A MUX, using nearly identical latches, was manufactured during the OptCom project Figure 6.27: A double, non-linear phase detector schematic. The D latches are color-coded to indicate the phase of their respective clocks and show how the intermediary samples $(T_1 \& T_2)$ are retimed to be in phase with the data $(C_1, C_2, D_1 \& D_2)$ . Figure 6.28: Double, non-linear phase detector operation using three samples. [74]. Measurements showed "acceptable" eyes at 165 $\mathrm{Gb/s}$ ; testimony to the VIP-2 capabilities. ### 6.4.5 Logical gates The phase detector logic is made up of logic XOR and NAND gates. #### 6.4.5.1 XOR gates The differential XOR gates have a topology similar to the D latches, though with some minor differences in sizing and without inductive peaking being employed. The schematics is shown in fig. 6.36. 6.4. Phase detector 103 Figure 6.29: Double, non-linear phase detector operation as a function of the incoming data. Clk is early. The XOR gates form all (Hogge-type) or part (Alexander-type) of the phase detector logic. The Hogge-type phase detector is very sensitive to signal shape and latency. It is important for the logic to generate as well defined pulses as possible. This is achieved by appropriate sizing, biasing and layout, but the XOR gates still exhibit a fundamental problem. The schematic clearly shows that the two input signals are not treated symmetrically. The B port requires an extra set of emitter followers, and the switching transistors are placed in series rather than parallel. This means that the output signal, Q, would react differently (speed, shape etc.) to transitions on the two input ports. A delay, in the form of a transmission line, is inserted in the path of the A port to equalize the delay, but the waveforms will still retain a different form, depending on which (or both) input port the transition occurs. The results of a simulation containing three different versions of the XOR gate is presented in fig. 6.37. The choice of parameters have a noticeable effect on the waveforms, particularly with respect to spikes. The third version (v3) is a plain XOR gate with minimum size transistors (but optimized current densities). The second version version (v2) has larger transistors with slightly better performance, but it requires more power and the improvement is minuscule. The first version (v1) introduces a delay on the A port equivalent to the propagation delay of an emitter follower. The example demonstrates that signal timing can be more important than component sizing. Figure 6.30: Double, non-linear phase detector operation as a function of the incoming data. Clk is late. #### 6.4.5.2 AND/NAND gates The differential NAND gate is similar to the XOR gate. The circuitry is somewhat simpler, see fig. 6.36. The NAND gate is even less symmetric than the XOR gate, with the differential B port being skewed with respect to Q. The NAND gate is only used in the Alexander type phase detector, so it can be ignored. The problem could be eliminated by arranging the XOR gate a bit differently<sup>14</sup>. The NAND gate is differential. The output is complemented by switching the differential signals and the NAND gate becomes an AND gate. # 6.5 Charge-pump filter The charge-pump filter generates a control signal to the VCO, based on the results from the phase detector. The control signal will adjust the frequency and phase of the VCO to match the the frequency and phase of the incoming data. Filters can be implemented on-chip or partially off-chip. Filters have often been implemented partially off-chip [21, 3, 5, 2, 6, 4, 9, 8, 77, 7], to simplify the realization of the large passives required by the low-pass filter, or to allow filter modification or tuning after the chip was produced. Such a modification or tuning could be $<sup>^{14}/</sup>Q$ could be connected to three of the four collectors. Figure 6.31: The double, non-linear phase detector layout. necessary if the process variation is significant. On-chip filters have the advantages of integration, such as a reduced number of components and connections. It has been decided to integrate the filter used in this project. The justification is that integration is an efficient method for reducing the cost of production, and it should be implemented when possible. In this project, a charge-pump filter similar to [10, 11] was used (cf. [12] for an interesting variation). The schematic is show in fig. 6.39. The filter core consists a resistor, $R_S$ , and a capacitor, $C_S$ . The core is surrounded by two differential stages serving as charge-pumps, one for each of the control signals. The operation of the control signals depends on the type of phase detector. The charge-pumps can differentially sense the E and R signals while maintaining the gain of the phase detector/charge-pump filter. The high gain results in a very small phase error once the clock and data have been locked. Figure 6.32: A D flip-flop schematic. Figure 6.33: The high-speed D latch schematic. ### 6.5.1 Hogge-type phase detector input The Hogge-type phase detector has two control signals, E & R. R consists of a series of discrete pulses, each with a 100% duty cycle, in the event of a transition. E also consists of a series of discrete pulses in the event of a transition. However, the duty cycle of E depends on the phase difference between the clock and the incoming data. The duty cycle varies between 0-100%, 50% indicating that the clock and data are synchronized. The probability of a transition between to adjacent bits is 50%, assuming random data. The result is that E is active 50% of the time E on average. However, E is only active 25% of the time (on average). This discrepancy between the control signals must be taken into account by the loop filter. The schematic (fig. 6.39) shows two charge pumps belonging to two differential stages. The charge-pumps should be balanced when the VCO is in phase. The charge pumps are controlled E & R. The charge-pump controlled by R is twice as active as <sup>&</sup>lt;sup>15</sup>(Duty cycle)×(probability of the pulse occurring). Figure 6.34: D flip-flop data latching. Figure 6.35: D flip-flop eye diagram. Figure 6.36: The XOR-gate schematic. $Q = A \oplus B$ . the charge-pump controlled by E. The balance is accomplished by giving the charge-pump controlled by E twice the charge-pumping capacity; $I_{UP} = 2 \times I_{DOWN}$ . ### 6.5.2 Double Alexander-type phase detector input The A2 phase detector also has two control signals, E & L. Both E & L consist of a series of discrete pulses, each with a 100% duty cycle. The pulses indicate whether the clock is early or late, and there are no false flags. The A2 phase detector operates on three adjacent bit simultaneously and requires at least one transition within this Figure 6.37: Transient analysis of three variations of the XOR gate. pattern to make a valid detection. The probability of one or two transitions occurring within the three bit pattern is 75%, assuming random data. The result is that E & L together are active 75% of the time on average<sup>16</sup>. The symmetric behavior of E & L means that the symmetric loop filter architecture doesn't require balancing between the charge pumps; $I_{UP} = I_{DOWN}$ . ### 6.5.3 Charge-pump filter implementation A good balance should be achieved between the two differential stages. This is simple when both stages use identical components (resistors, transistors etc.), but not when they have different dimensions. The current sources are based on current mirrors, but their performance depends on process variation and differs from the models. This can be taken into account by adjusting the biasing of one or two of the current sources, using an external signal or two. However, signal pads are not free. For the A2 phase detector, it was decided to use the same transistors and resistors in both of the differential stages. The Hogge-type phase detector use exactly the same component dimensions. The difference is that the high-capacity charge pump has twice the number of transistors in parallel. It doesn't make the circuit perfectly balanced, but it does take process variation into account 17. There is also an asymmetry in the (linear) E & R signals that is more difficult to deal with. The asymmetry and latency of the phase detectors mean that short E pulses, with a duty cycle of 0-15% (0-1.5 ps of a 10 ps period), do not slip through! $<sup>^{-16}</sup>E$ is active 75% of the time if the clock is early, and L is active 75% of the time if the clock is late. Either signal will be active 37.5% of the time when the clock is synchronized. <sup>&</sup>lt;sup>17</sup>There are many other effects to take into account, such as the orientation dependency of doping, but these can be ignored in this application. Figure 6.38: Schematic of NAND-Gate. $Q = \overline{A \times B}$ . The rise and fall times also take a significant slice of the duty cycle. This problem limits the maximum bit rate the phase detector is able to interpret, and it is identified as the principal bottleneck in the CDR<sup>18</sup>. The loop filter parameters $(K_{CP}, \tau_1 \& \tau_2)$ , as defined in table 5.1, are related to the elements $(R_C, R_S \& C_S)$ of the charge-pump filter. The loop filter parameters can be expressed as: $$K_{CP} = g_m R_C \tag{6.6}$$ $$\tau_1 = (2R_C + R_S) C_S \tag{6.7}$$ $$\tau_2 = R_S C_S \tag{6.8}$$ A capacitor, $C_P$ , is placed in parallel to the series connection of $R_S$ and $C_S$ , with the intention of limiting the undesired high-frequency ripple included in the output signal, $V_C$ . The (time domain) response to a fixed phase error is shown in figs. 6.40 & 6.41 for two different sizes of $C_P$ . The waveforms demonstrate the importance of $<sup>\</sup>overline{\ \ \ }^{18}$ Increasing the bandwidth to 160 Gbit/s would effectively cut pulses having a duty cycle between 0-25% and perhaps affect even wider pulses. Figure 6.39: The charge-pump filter schematic. $C_P$ in reducing the ripple, but also that $C_P$ affects the locking speed. $C_P$ should be large enough to cancel out most of the ripple, to avoid generating jitter in the VCO, but not large enough to have an impact on locking speed. $C_P$ is chosen to be much smaller than $C_S$ , thus maintaining the stability of the PLL loop. The layout of the charge-pump filter is shown in fig. 6.42. The two sides of the charge-pump are clearly asymmetric. Both sides use the same transistor sizes, but one side uses twice the number of transistors placed in parallel. This (supposedly) yields better matching. ### 6.6 Buffers The two CDR circuits contain several buffers each. The buffers are used for reducing the load on circuits sensitive to fan-out, such as VCOs and static dividers, as well as drivers to reshape and reamplify digital signals in places such as the output ports. The buffers are all simple differential stages, similar to the buffers at the static divider; cf. section 3.3. The buffers are similar but slightly different; each buffer being optimized for its 6.6. Buffers 111 Figure 6.40: $V_C$ and $/V_C$ in the filter RC-tank. particular use. A typical buffer is exemplified by the schematic in fig. 6.43. This standard buffer can be modified in numerous ways: - Input emitter followers. A buffer may or may not have emitter followers. Emitter followers have several advantages, but increase power consumption and add nothing to the logical function; cf. section 2.1.2. However, adding emitter followers provides better biasing of the differential stage by preventing saturation. All of the buffers used in both of the CDR circuits have input emitter followers. The buffers may also have emitter followers on the output ports. - Transistor size of input emitter followers. The transistor size of the emitter followers can also be adjusted, which affects both the input load (primarily the capacitive part) and power consumption (assuming that the transistors are biased for optimum performance, cf. section 2.3). The output buffers of the static divider use minimum size transistors (emitter size $0.5~\mu m \times 1.0~\mu m$ ) in the emitter followers to minimize the load on the static divider core. Otherwise, the normal transistor (emitter) size for emitter followers is $0.5~\mu m \times 2.6~\mu m$ , which offers the best performance. The same transistor size is used in all of the differential stages. - Voltage supply. The CDR circuits have a dual power supply. The reason is that a small part of the CDR circuits requires a V<sub>SS</sub> of -5.0 V (the VCOs), while -4.2 V will suffice for the rest of the circuits. Using a dual power supply lowers the total power consumption of the circuits. Changing the V<sub>SS</sub> for a circuit will affect the biasing conditions of the entire circuit. The transistor sizing can remain constant, but all resistors are modified accordingly. All of the buffers have -4.2 V and -5.0 V versions. Figure 6.41: The $V_C$ (differential) time domain response to a fixed phase error. The waveform clearly demonstrates the RC-type delay and the smoothing effect of the ripple capacitor. • Layout. The layout can be more compact if the available space is restricted. The largest components are typically the resistors, but they can be folded. An example is shown in fig. 6.44. # 6.7 Linear amplifier The control signal produced by the charge pump-filter has an amplitude of 70 mV (differential). This is insufficient to control the VCO within a useful frequency range; cf. chapter 4. The straightforward solution is to insert a linear amplifier, shown in fig. 6.45. It is a very simple design, based on the differential stage presented in the previous section. The two resistors provide a linearized voltage output up to the same amplitude, $R \times I_D$ , as before. The slope is determined by the size of the resistors, as shown in fig. 6.46. The linear amplifier consists of two of these linear stages placed in series. This is because two stages offer better linearity than one stage. The two stages are topologically identical, but component sizes are slightly different. The combined linear output voltage range is about 700 mV. The linear amplifier does not need to be very fast, and the design has therefore been optimized for low power consumption (i.e. minimum size transistors). The linear amplifier has a differential output, but the VCO uses a single-ended control signal. The control signal only requires a very narrow bandwidth, and the high-frequency ripples from the charge-pump filter<sup>19</sup> are canceled out by an appro- <sup>&</sup>lt;sup>19</sup>The ripples will cause additional jitter when reaching the VCO. Figure 6.42: The charge-pump filter layout. This is the asymmetric version for the linear phase detector. priate RC stage before the signal is distributed. # 6.8 Implementation of CDR circuits Both of the CDR circuits are very large. Each circuit contains >100 transistors and a significant number of transmission lines. Extracting the circuit parasitics adds numerous small capacitors to the circuits. It is also possible to create an S-parameter Figure 6.43: Schematic of a buffer based on a differential stage. model of an entire circuit<sup>20</sup>, making the circuit model even more complex. Detailed models and conservative simulation algorithms yields reliable results, but simulations can easily become impractically time-consuming. Precision has to be balanced with efficiency. The VCO circuits presented in chapter 4 are sufficiently small to be represented by an S-parameter model. The same methodology is used to simulate the static divider circuit in chapter 3, but this circuit has a size that is probably close to the practical limit. Such a level of detail is not practical when simulating the massive CDR circuits. All of the CDR subcircuits can be verified independently, but verifying the entire CDR circuit is a much more difficult task. Using the CDR circuit simulation results as feed-back to make design changes would be a very time-consuming process; it takes about a week to simulate the locking process for the extracted CDR circuit. This is obviously because of the circuit complexity, but also because it involves a very long time span where hundreds<sup>21</sup> of bits are passing through the CDR while the locking process is being performed. The CDR circuits are (for most purposes) too complex to simulate at the transistor level. The obvious solution is to follow the standard top-down design process. The process begins with the creation of a high-level model of the CDR circuit, using Analog Hardware Description Language (AHDL) to make behavioral models of all the subcircuits. Once the simplified CDR model becomes operational, the behavioral <sup>&</sup>lt;sup>20</sup>The S-parameter analysis of a circuit refers only to the metal layers. Circuit components such as resistors, capacitors and transistors are not taken into account. The circuit components are removed prior to the S-parameter analysis and every connection between a component and metal is replaced by a port. The program (ADS) calculates the S-parameters between all of the ports. The components are then placed between the ports and the circuit is simulated. Transmission lines use inherent models, rather than relying on extracted parasitics or s-parameter analysis. This is because the inherent models are deemed to be more precise representations. Certain transmission lines are replaced with simplified models to decrease simulation time. <sup>&</sup>lt;sup>21</sup>The filter is set to have a delay of about 50 bits; equivalent to 500 ps at 100 Gb/s. Figure 6.44: Compact layout of a buffer based on a differential stage. subcircuit models can be replaced one by one with increasingly detailed models. The final CDR model contains fully extracted models of all subcircuits. Having multiple models of varying complexity for each subcircuit makes it possible to evaluate the system impact of a particular subcircuit while using simple (but fast) models for the rest of the CDR circuit. The top-down design process is capable of producing the quick-and-dirty results of the CDR circuit on a system level, necessary for the repetitive design steps, such as adjusting the filter passives to appropriate settings. The connections between subcircuits pose a particular problem. They are not typically considered subcircuits but can (and sometimes should) be modeled as such. Short connections are simple to deal with and can be modeled using lumped components. Longer connections behave like transmission lines and must be modeled accordingly; cf. section 2.2. Cadence (as well as ADS) use inherent models for transmission lines, and the choice of model depends on the geometry and bandwidth of a particular line. The inherent models are simple to use but they are also complex and time consuming to simulate. Furthermore, they exhibit an annoying tendency to cause convergence problems when finding the initial conditions. Finding a suitable model for a transmission line is not limited to the choice between a short-circuit and a broadband transmission line. A simplified, frequency specific model can be Figure 6.45: The linear amplifier schematic Figure 6.46: Linear amplifier DC characteristics. selected if the bandwidth is relatively small, which is the case for clock distribution networks. A home-made (frequency specific) RLGC-model with ten sections is even faster, cf. section 2.2.4, while maintaining a high level of precision. Avoiding the inherent broadband transmission line models can dramatically reduce the simulation time. # 6.8.1 Double Alexander-type CDR circuit The schematic of the 2A CDR circuit is shown in fig. 6.47. Limiting, linear and differential amplifiers have been added. The clock signals will have to travel a relatively long distance between the clock Figure 6.47: Schematic of the double Alexander-type CDR circuit. buffers and the D latches. Each D latch is therefore fitted with a local clock input stage consisting of termination resistors, $R_i$ , and emitter followers; cf. section 3.2. The 2A CDR circuit contains two clock buffers, one for Clk and one for Clk90. The clock buffers are based on open-collector transadmittance stages. Each clock buffer is loaded with ten D latches. Impedance matching can be achieved by designing the transmission line impedance, $Z_0$ , according to the number of loading latches, see fig. 6.48. The value of $Z_0$ can be locally adapted with respect to signal splits. The clock amplitude can be increased if the line impedances are chosen according to $Z_0 > R_i/n$ , with n being the number of loading D latches on the remaining transmission line. This is due to the inductive peaking effect of the transmission line; cf. section A.2. $Z_0/4$ is 2.5 times greater than $Z_0/10$ . But such a wide range is not possible when using VIP-2; cf. section B.2. The transmission line is realized using M4 over M2 and thus have a practical range between 52 and 40 $\Omega$ . The resulting mismatch causes reflections to occur. The schematic of the 2A CDR circuit (fig. 6.47) reveals four serious problems with the 2A CDR circuit. These are: 1. Yield. The VIP-1 and VIP-2 processes have experienced a number of yield problems. Changes to design rules, discussions with Vitesse<sup>22</sup>, visual inspection of dies and measurements have revealed both antenna problems and faulty MIM-capacitors. The MIM-capacitors have been particularly difficult to process without a high probability of fatal short-circuits. The VIP-2 process has continuously been modified and design rules have been tightened to control these problems. The yield is also suffering from an unstable process. The yield <sup>&</sup>lt;sup>22</sup>Minh Lee. Figure 6.48: The clock distribution network in the 2A CDR phase detector. varies greatly between consecutive MPW runs and has occasionally been below 10% for circuit sizes of about 1 mm² or less²³. An unstable process will occasionally result in near zero yield and/or dismal performance, independent of the circuit parameters. However, the impact of randomly occurring process flaws can be limited by restricting the circuit area and the number of devices. The massive size (2.060 mm²) and large number of devices (394 transistors) of the 2A CDR circuit makes it vulnerable to such flaws. In conclusion, the yield is a real concern when using the VIP-2 process, and the size and complexity of the 2A CDR circuit has the odds stacked against it. 2. Heat. Many of the transistors are biased at a collector current density, J<sub>C</sub>, of 4.0 mA/μm² to maximize f<sub>t</sub> and f<sub>max</sub>; cf. section 2.3. This is done in all the critical parts of design, such as the D latches. Each D latch has a current consumption of 32 mA and 20 D latches require 640 mA, equivalent to 2.7 W when using a nominal V<sub>SS</sub> of -4.2 V. The complete 2A CDR circuit requires 1.14 A, corresponding to 4.80 W²⁴. This power manifests itself as heat. It would be very difficult to effectively dissipate this heat from a relatively small area (2.060 mm²). Thermal modeling is required to accurately predict the temperature of the die when it is mounted in various configurations. However, $<sup>^{23}</sup>$ The MUX, DEMUX and static divider submitted to VIP-2 MPW Run 1 had a yield of < 10%. $^{24}$ The current and power does not correspond to $V_{SS} = -4.2$ V. This is because the VCO circuit has a separate voltage supply of $V_{SS,VCO} = -5.0$ V. The VCO uses 190 mA. a low power VCO design (Colpitt II) failed with a current consumption of about 88 mA<sup>25</sup>, corresponding to 440 mW when $V_{SS} = -5.0$ V. The die was not properly attached to the substrate at the time<sup>26</sup>, but the substrate and the probe station chuck still provided a massive heat sink. The conclusion is that the heat generated by a circuit as complex as the 2A CDR is a serious threat to its safe operation and that the circuit will probably require active cooling. - 3. Routing. The size of the layout made signal routing problematic. The data and clock signals to the D latches and the latched data signals to the phase detector logic required very long transmission lines, and these were very difficult to match. The data and clock signals to the D latches have to propagate in the same direction to avoid reducing the duty cycle unnecessarily. Accessing the D latches with the clock and data signals simultaneously, while ensuring that the transmission lines were matched, required a fairly complex web of very long lines. A symmetric placement of subcircuits (to simplify matching) would require the VCO to be placed in the same spot as both the limiting amplifier and the phase detector logic, which is impossible in 2D. Differential signaling further aggravated the problem by requiring nearly twice the layout space relative to single-ended signaling, making the layout even more spacious and signal lines correspondingly longer. It should be mentioned that several 40 Gb/s designs have forsaken differential signaling for both clock and data to simplify routing; cf. [3, 2, 4]. - 4. Skewing. The signal bandwidth is close to the operating speed of the phase detector logic. The logic requires input signals that are well defined, but differences in transmission lines and asymmetric gates cause the signals to become skewed as they pass through the logic gates. The phase detector logic, although not without merits<sup>27</sup>, is three levels deep. Each level increases the skewing, which has a major impact on the duty cycle of the result. The signal may shift as much as 2 ps, or 20% of the period of a single bit (10 ps). The propagation delay and fan-in requirements also mean that narrow signal pulses have difficulties in slipping trough the logic. These issues make it clear that the 2A CDR circuit has inherent weaknesses. The probability of a successful MPW run was deemed sufficiently low to warrant a redesign using a Hogge-type phase detector. This circuit is the topic of the following section. The layout of the discarded 2A CDR circuit design is presented in fig. 6.49. The specifications for the same circuit are shown in table 6.3. $<sup>^{25}</sup>$ The Colpitt II VCO had to be as small as possible to fit into an existing gap on the photoreticle. Vitesse offered the space for free and the limitation had to be respected. The Colpitt II was mirrored with a Colpitt II variant sharing the same DC pads, as shown in fig. 4.22. This resulted in a smaller combined area and the circuits fitted snugly into the reticle. The disadvantage of shared pads is that it also doubles the current consumption as both circuits share the supply pads. The current consumption is about 44 mA for each circuit at $V_{SS} = -5.0$ V or 88 mA for both. <sup>&</sup>lt;sup>26</sup>The die should be fixed to the substrate using a heat conducting glue to improve thermal resistance. $<sup>^{27}</sup>$ The double Alexander-type design detects 75% of random transitions, as opposed to 50% with a standard Alexander-type design. | Circuit | CDR (quadrature phase) | |------------------------------|-------------------------| | Supply voltage (dual) | -4.2/-5.0 V | | Current consumption (LA) | 139 mA | | Power consumption (LA) | 584 mW | | Current consumption (PD) | 760 mA | | Power consumption (PD) | 3.19 W | | Current consumption (Filter) | 60 mA | | Power consumption (Filter) | $252~\mathrm{mW}$ | | Current consumption (VCO) | 35 mA | | Power consumption (VCO) | 175 mW | | Current consumption (SDIV) | $142~\mathrm{mA}$ | | Power consumption (SDIV) | $596 \mathrm{mW}$ | | Current consumption (total) | 1.14 A | | Power consumption (total) | 4.80 W | | Width | $1593~\mu\mathrm{m}$ | | Height | $1293 \mu\mathrm{m}$ | | Area | $2.060 \mathrm{\ mm}^2$ | | Pad pitch | $150 \mu \mathrm{m}$ | | Transistor count | 394 | | Resistor count | 475 | | Capacitor count | 6 | Table 6.3: Specifications for the CDR circuit (double Alexander-type phase detector). Figure 6.49: Layout of the 2A CDR circuit. ### 6.8.2 Hogge-type CDR circuit It was decided that the existing design would be too sensitive to the above mentioned factors, resulting in an unreliable design. The architecture had to be drastically simplified. Reducing the phase detector to a single Alexander-type was considered, but didn't go far enough. Eventually a half-rate design using a Hogge-type phase detector was introduced and implemented. The resulting design is shown in fig. 6.50. The schematic is much more elegant, but the perceived simplicity belies several problems: - The data and clock signaling in the core now requires more bandwidth relative to the previous design. Losses and matching, both impedance and between lines, become even more important. - The secondary latches must also be high-speed designs, requiring more circuitry Figure 6.50: Architecture of full-rate type CDR and greater power consumption<sup>28</sup>. • The linear phase detector is very sensitive to skewing by delays in latches and logic. The limiting amplifier is similar to the design used in the previous architecture, but now it has less loading because it only has to distribute the data signal to two inputs (and also smaller losses due to the shorter lines). This required some resizing of components, but no further changes. The slower, secondary latches were replaced with the fast front-end model. The logic gates were maintained, but a new type of filter was required. The new filter had a much smaller output range and required a linear amplifier. The amplifier was necessary to maintain the dynamic frequency range of the VCO, and thus the bit rates which the CDR circuit is able to capture. A layout of the CDR circuit is shown in fig. 6.51. A photo of the same circuit is shown in fig. 6.52. The details will be discussed in the following sections. The specifications for the Hogge-type CDR circuit are shown in table 6.4. It's found to have just half the size and requires only a third of the power consumption, relative to the 2A design. No paper dealing with PLLs are complete without a mention of the most basic theory. This has been done in chapter 5. However, the components of the charge-pump filter were adjusted using the simplified AHDL model and verified through a full simulation at a few different data rates near $100~{\rm Gb/s}$ . # 6.9 CDR circuit measurements The half-rate CDR circuit with a Hogge-type phase detector has been delivered from Vitesse (Run 6). A total of ten separate dies have arrived. A stand-alone 50 GHz $<sup>^{28}</sup>$ Switching is improved by maximizing $f_t$ and $f_{max}$ in key transistors, which occurs at 3.5-4.5 mA/mm<sup>2</sup>. Power can be saved by using much smaller current densities if speed is not the key objective or even a disadvantage. Figure 6.51: The layout of the Hogge-type CDR circuit. VCO circuit was submitted along with the CDR circuit. The VCO is represented on each of the ten dies and is located close to the CDR circuit. The stand-alone VCO is identical to the VCO used in the CDR circuit. The purpose of the stand-alone VCO was to measure the performance of the VCO separately. The oscillation frequency and dynamic frequency range are of particularly interest. Experience from previous VCO measurements showed that these parameters would be process dependent. The VCO measurements could also be compared to previous measurements if the CDR circuit would fail to perform as expected. All of the dies were mounted on a brass substrate using a flip chip-machine. The substrate simplifies the handling of the chip, particularly when placing the probes. More importantly, the substrate also serves as a heat sink<sup>29</sup>. The dies adhere to the substrate trough a thin veneer of glue with a very low thermal resistance. The glue gives the die a much better thermal contact with the substrate than what can be achieved by merely placing the die on the substrate. #### 6.9.1 VCO measurement results The VCO circuit is designed to have a center frequency of 50 GHz and is a variation of the Colpitt VCO circuit #2. A photo of the circuit is shown in fig. 6.53. The <sup>&</sup>lt;sup>29</sup>Previous VCO measurements showed that unmounted circuits could fail rapidly; cf. section 4.3. The CDR circuit consumes much more power than a VCO, and the heating problem is correspondingly larger. Figure 6.52: A photo of the Hogge-type CDR circuit. spectrum analyzer has a nominal limit of 50 GHz, but it can be used up to about 52.5 GHz. The actual center frequency should then be within the bandwidth of the spectrum analyzer. This simplifies measurements because the clock signal can be measured directly, i.e. without resorting to external mixing. The lower frequency also reduces losses in signal amplitude, though the results are automatically compensated for losses in either case. The measurement setup is shown in fig. 6.54. Both $V_{tune}$ and $V_{SS}$ are variable. $V_{tune}$ is used to test the dynamic frequency range. A variable $V_{SS}$ makes it possible to find the precise voltage where the tank amplification is >1. Having every transistor in the VCO correctly biased over an entire period requires a $V_{SS}$ of -4.3 V. The VCO will start to oscillate at a higher voltage, but output power etc. will be limited. Knowing if the VCO could operate sufficiently well at a higher voltage could help reduce the overall power consumption of the CDR circuit. All of the circuits were tested in the 0-50 GHz, 50-75 GHz and 75-110 GHz range. The results of the measurements are shown in table 6.5. All of the VCO circuits with a detectable spectral component were found to operate in the lower end of the V-band<sup>32</sup>. The performance is dismal. Two of the VCO circuits show no response when the supply voltage is applied. A third circuit has a very low current consumption but no output signal can be discerned within the 0-110 GHz band. The rest of the circuits have a very low output voltage. Die 2 has the best performance but the output voltage amplitude is only equivalent to 62 mV, which is 13.7 dB below the <sup>&</sup>lt;sup>30</sup>These are negative values. -4 is higher than -5. $<sup>^{32}50-75</sup>$ GHz. | Circuit | CDR (Differential phase) | |------------------------------|--------------------------| | Supply voltage (dual) | -4.2/-5.0 V | | Current consumption (LA) | 116 mA | | Power consumption (LA) | 487 mW | | Current consumption (PD) | 152 mA | | Power consumption (PD) | $638 \mathrm{mW}$ | | Current consumption (Filter) | 60 mA | | Power consumption (Filter) | 252 mW | | Current consumption (VCO) | 44 mA | | Power consumption (VCO) | $220~\mathrm{mW}$ | | Current consumption (total) | 372 mA | | Power consumption (total) | $1.60 \; { m W}$ | | Width | $1350~\mu\mathrm{m}$ | | Height | 869 $\mu\mathrm{m}$ | | Area | $1.173 \mathrm{mm}^2$ | | Pad pitch | $150~\mu\mathrm{m}$ | | Transistor count | 208 | | Resistor count | 229 | | Capacitor count | 7 | Table 6.4: Specifications for CDR circuit (Hogge-type phase detector). | Run 7, Die #, VCO50 | $I_{CC} (\mathrm{mA})^{31}$ | $f_{osc}$ (GHz) | $P_{out}$ (dBm) | $V_{out} \text{ (mV)}$ | |---------------------|-----------------------------|-----------------|-----------------|------------------------| | 1 | 0 | N/A | N/A | 0 | | 2 | 100 | 53.42 | -14.15 | 62 | | 3 | 100 | 54.42 | -23.62 | 21 | | 4 | 0 | N/A | N/A | 47 | | 5 | 100 | 54.67 | -16.65 | 21 | | 6 | 40 | N/A | N/A | 0 | | 7 | 100 | 54.58 | -16.98 | 45 | | 8 | 100 | 52.75 | -16.48 | 47 | | 9 | 100 | 54.69 | -16.82 | 46 | | 10 | 100 | 54.33 | -15.98 | 50 | Table 6.5: The measured data for the 50 GHz Colpitt VCO (Run 7). Figure 6.53: A photo of 50 GHz version of the Colpitt VCO circuit #2. target. The VCO tank is buffered and limiting to about 300 mV, which indicates that the voltage amplitude in the tank must be very low. The inescapable conclusion is that Run 7 is suffering from very poor yield. The VIP-2 process has previously experienced yield problems related to the MIM-capacitor process stages. These errors manifested themselves as short-circuits, probably through the capacitor dielectric. None of the VCO circuits have a short-circuity. The origin of this problem is therefore different. A DEMUX circuit submitted by Joakim Hallin also suffered from poor yield, with no operational circuits to be found. Vitesse (Max Helix) has later confirmed that the yield of Run 7 was zero or effectively zero.. Measurements of the only functional 50 GHz VCO showed an increase in the voltage supply noise level once the VCO started to oscillate. This is due to the slightly imperfect on-chip decoupling of the voltage supply #### 6.9.2 CDR measurement results The test results in table 6.5 from the 50 GHz version of the Colpitt VCO circuit 2 are a clear indication that this particular process run has a very low yield. The CDR circuit was first subjected to an input data stream consisting of a 80 to 120 Gb/s sequence of alternating bits, generated by a 60 GHz synthesizer. The independent VCO data Figure 6.54: Measurement setup for 50 GHz VCO. indicated that the CDR circuit would lock at slightly above 100 Gb/s, but that turned out not to be the case. Synchronizing the synthesizer with the oscilloscope at 40 Gb/s resulted in a nicely shaped synchronized output, well outside the locking range of the CDR circuit. This was also unexpected, because an operational CDR circuit should keep latching bits all across the duty cycle at this bitrate, and the output should be completely asynchronous. The CDR circuit appears to be transparent to the input data, i.e. the output is identical to the input and no retiming is taking place. This indicates that the D latches are in a metastable state, which can only occur if there is no clock signal. It was possible to detect a small increase in noise level in the supply voltage when the working VCO circuits of the previous tape-outs started to oscillate. This occurred when the supply voltage reached -4.3 V. There was no such indication with the CDR circuit, where the noise appeared to be constant as the supply voltage was decreased. The conclusion must be that the VCO circuit of the CDR does not oscillate, or do so at a very small voltage amplitude, which does not overcome the metastable state of the D-latches. Regrettably, Vitesse has ceased all InP process activities, and is possibly attempting to sell the VIP-2 process. The MPW run 6, containing the CDR circuit, was to be the final run<sup>33</sup>. It is possible that the production problems and the resulting low yield was related to the shortage of process specialist, which occurred when the VIP-2 cancellation was announced. # 6.10 Suggestions and improvements No perfect circuit was ever made. Alternative implementations suggests themselves during a project, and short-comings can not always be addressed within a tight tapeout schedule. Some possible improvements of the CDR circuits are the topic of this section. $<sup>^{33}</sup>$ Another MPW run was performed later, probably because of (paying) customer dissatisfaction with the dismal yield of Run 6. ### 6.10.1 Design rule restrictions Production. Vitesse has repeatedly experienced yield problems related to the MIM-capacitors of the VIP-2 process. The problem has gradually, but unevenly, been contained. The improvement came at the cost of ever stricter design rules, making the capacitors more difficult to place and use. The design rules effectively limits the DC current density through the capacitor. This makes it difficult to place a decoupling capacitor directly between a supply pad and a circuit with a significant current consumption. Improving the MIM-process stages while loosening the design rules would result in doubling the effective on-chip decoupling capacitance of the supply voltage. ### 6.10.2 Circuit improvements The Alexander-type phase detector core provides pulses to the phase detector logic that are relatively wide and well defined. Those qualities ensure that the phase detector logic will yield control signals that are reliable, despite the various short-comings of the logical gates. The Hogge-type phase detector is much less sensitive, as the phase detector core generates linear, and possibly very narrow pulses. The phase detector logic has difficulties in dealing with short pulses because of propagation delay and inherent skewing in the asymmetric logical gates. The result is that very short pulses not will be able to filter through. The propagation delay is difficult to improve, but it should be possible to implement a symmetric XOR gate; cf. [12]. The charge-pump filter should be optimized to comply with the jitter performance specifications described in [112]. It was not possible to do this because of time constraints<sup>34</sup>. The frequency tuning range of the VCO has to encompass the precisely defined data rate<sup>35</sup> of the system while allowing for process variation. A larger tuning range would make this goal easier to achieve. Increasing the tuning voltage range also increases the frequency tuning range. The output signal from the charge-pump filter is weak, and requires a linear amplifier to enhance the tuning voltage range. The current linear amplifier design yields a mere 0-400 mV signal, which is the range of the tuning voltage. The amplification of the linear amplifier could either be improved, cf. the amplifiers in [10, 11], or the amplifier could be used as the first stage of a two stage amplifier. The biasing of the VCO tuning voltage is important, as shown in chapter 4. The change in frequency, $f_{VCO}$ , is related to both the change in tuning voltage, $V_{tuning}$ , and the tuning voltage DC-offset, $V_{bias}$ : $$\frac{\Delta f_{VCO}}{\Delta V_{tuning}} \propto A\left(V_{bias}\right) \tag{6.9}$$ Adjusting $V_{bias}$ (by level shifting) to an appropriate level would improve the tuning range, though a noise penalty would be incurred. $<sup>^{34}</sup>$ The final VIP-2 tape-out was moved ahead of schedule. <sup>&</sup>lt;sup>35</sup>The specifications for 100 Gb/s Ethernet have not yet been completed. ### 6.10.3 The possibility of achieving 120 or 160 Gb/s Most of the components can be reused and enhanced for a 120 or $160~{\rm Gb/s}$ version. However, some components will be more troublesome than others. The LA requires a higher bandwidth, but this could be achieved if the amplification requirements were relaxed. The phase detector is a critical component. Hogge-type phase detectors are, as previously mentioned, very sensitive to signal shape and latency. It is highly unlikely that such an architecture can successfully generate useful control signals at 160 Gb/s (using the VIP-2 process). A more robust (but complex, large, high power, noisy etc.) Alexander-type [1, 8, 9] or double Alexander-type phase detector (perhaps similar to [3, 2, 4], but with only three columns) could possibly do the trick, judging from the 165 Gb/s performance of the D flip-flops [74, 115, 75, 76]. The phase detector should require slightly more fan-out from the D latches than a DEMUX, and it is an open question whether this can be achieved at 160 Gb/s. Both the charge-pump filter and the linear amplifier should scale well. In particular, the bandwidth requirement of the linear amplifier is far below its actual performance. The Hogge-type phase detector would require a 60 GHz (120 Gb/s) or 80 GHz (160 Gb/s) VCO. This is not difficult, because a 100 GHz VCO design is already available; cf. chapter 4. However, the Alexander-type phase detector requires a quadrature half-rate clock. This clock must be created using a 160 GHz clock combined with a 160 GHz divider; cf. chapter 3. A 160 GHz clock appears feasible, but the varactor capacitance would have to be very small. The parasitic capacitance of the tank would be comparable to the varactor capacitance, and probably larger. The (variable) varactor capacitance would only be a small part of the total tank capacitance, thus seriously reducing the frequency tuning range. A small frequency tuning range would sort of defy the purpose of a variable oscillator, and it would become correspondingly difficult to place the frequency range across the data rate while taking process variation into account. 160 GHz is probably beyond the capacity of the static divider based on VIP-2; cf. [79, 77, 78]. However, a dynamic divider is realistic in this case. A (very useful) 212 GHz dynamic divider has previously been designed at MC2, also using the VIP-2 process<sup>36</sup> [77]. <sup>&</sup>lt;sup>36</sup>Possibly the VIP-1 process. ### Conclusion This thesis has demonstrated that it is possible to make three of the circuits required for a 100 Gb/s optical/electrical transceiver, using the VIP-2 InP DHBT process supplied by Vitesse. A 114 GHz static divider has been designed and tested. The static divider has been implemented both as a stand-alone chip, and in combination with the Colpitt and LC VCOs to form a half-rate quadrature clock generator. Three different VCOs have been designed and tested, operating at frequencies of 75 GHz, 86 GHz and 97 GHz respectively. Two of these are Colpitt type designs, and one is an LC type. All three circuits exhibit a phase noise between 80-83 dBc/Hz at 1 MHz offset, and have an output power of -9 to -6 dBm (differential). The output power could have been considerably higher if necessary, but it was not a requirement in this particular case. A complete 100 Gb/s CDR circuit has been designed and simulated, but has not yet been delivered from Vitesse. The CDR incorporates a limiting amplifier, as well as a 1:2 DEMUX. Suggestions for improving the performance of the circuits have been presented. It should be mentioned that the OptCom research group has also been involved in the design of other 100 Gb/s transceiver circuits [75], e.g. a 165 Gb/s 4:1 MUX [74] and a 110 Gb/s $2^9 - 1$ PRBS generator with error detector [115]. Some of these circuits are currently in the process of encapsulation. ## Acknowledgment I would like to thank the OptCom group at Chalmers, Torgil Kjellberg, Joakim Hallin, Thomas Swahn & Camilla Kärnfelt for our many discussions relating to circuit design, compound semiconductors, Linux, cross-country skiing and many other delightful subjects. I am grateful for the help I received with the flip-chip machine and during the long hours in the measurement lab. I am also indebted to Guðjón Guðjónsson for his endless patience and many practical hints and suggestions during the writing process. Without him, I'd still be struggling with LyX like Sisyphus, trying in vain to integrate the mess of text, data and pictures into the current shape of an elegant and coherent thesis. Last, but not least, this thesis owes much to the near infinite patience of my supervisor, professor Lars Dittmann, and the encouragements of his two sidekicks, the esteemed Dr. Michael Berger and Dr. Brian Mortensen. ## Bibliography - [1] M. Wurzer, J. Bock, H. Knapp, W. Zirwas, F. Schumann, and A. Felder, "A 40 Gb/s integrated clock and data recovery circuit in a 50-GHz $f_T$ silicon bipolar technology," *IEEE Journal of Solid-State Circuits*, vol. 34, pp. 1320–1324, September 1999. - [2] G. Georgiou, Y. Baeyens, Y.-K. Chen, A. H. Gnauck, C. Gröpper, P. Paschke, R. Pullela, M. Reinhold, C. Dorschky, J.-P. Mattia, T. W. von Mohrenfels, and C. Schulien, "Clock and data recovery IC for 40-Gb/s fiber-optic receiver," *IEEE Journal of Solid-State Circuits*, vol. 37, pp. 1120-1125, September 2002. - [3] M. Reinhold, C. Dorschky, E. Rose, R. Pullela, P. Mayer, F. Kunz, Y. Baeyens, T. Link, and J.-P. Mattia, "A fully integrated 40-Gb/s clock and data recovery IC with 1:4 DEMUX in SiGe technology," *IEEE Journal of Solid-State Circuits*, vol. 36, pp. 1937–1945, December 2001. - [4] M. Reinhold, T. W. von Mohrenfels, F. Kunz, E. Rose, A. Eismann, M. Kukiela, C. Wolf, F. Znidarsic, C. Dorschky, and G. Röll, "A 40/43-Gb/s CDR/DEMUX and MUX chipset integrated on MCM-ceramic with 3R-regeneration functionality," in 2003 IEEE MTT-S International Microwave Symposium Digest, vol. 2, pp. 1185–1188, IEEE, June 8-13 2003. - [5] K. Ishii, H. Nosaka, H. Nakajima, K. Kurishima, M. Ida, N. Watanabe, Y. Yamane, E. Sano, and T. Enoki, "1-W DEMUX and one-chip CDR with 1:4 DEMUX for 10 Gbit/s optical communication systems," in *Gallium Arsenide Integrated Circuit (GaAs IC) Symposium*, pp. 101–105, IEEE, October 24 2001. - [6] K. Ishii, H. Nosaka, H. Nakajima, K. Kurishima, M. Ida, N. Watanabe, Y. Yamane, E. Sano, and T. Enoki, "Low-power 1:16 DEMUX and one-chip CDR with 1:4 DEMUX using InP-InGaAs heterojunctions bipolar transistors," *IEEE Journal of Solid-State Circuits*, vol. 37, pp. 1146–1151, September 2002. - [7] H. Nosaka, E. Sano, K. Ishii, M. Ida, K. Kurishima, S. Yamahata, T. Shibata, H. Fukuyama, M. Yoneyama, T. Enoki, and M. Muraguchi, "A 39-to-45-Gbit/s multi-data-rate clock and data recovery circuit with a robust lock detector," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1361–1365, August 2004. - [8] S. Nielsen, J. C. Yen, N. K. Srivastava, J. E. Rogers, M. G. Case, and R. Thiagarajah, "A fully integrated 43.2-Gb/s clock and data recovery and 1:4 demux - IC in InP DHBT technology," *IEEE Journal of Solid-State Circuits*, vol. 38, pp. 2341–2346, December 2003. - [9] T. W. Krawczyk, S. A. Steidl, R. Alexander, J. Pulver, G. Kowalski, C. Hornbuckle, and D. Rowe, "A 39.8Gb/s to 43.1Gb/s SFI-5 compliant 16:1 multiplexer and demultiplexer for optical communication systems," in *Proceedings of the Custom Integrated Circuits Conference*, pp. 581–584, IEEE, September 21-24 2003. - [10] R.-E. Makon, R. Driad, K. Schneider, M. Ludwig, R. Aidam, R. Quay, M. Schlechtweg, and G. Weimann, "80 Gbit/s monolithically integrated clock and data recovery circuit with 1:2 DEMUX using InP-based DHBTs," in Compound Semiconductor Integrated Circuit Symposium, 2005, pp. 268–271, IEEE, 30 October-2 November 2005. - [11] R. E. Makon, R. Driad, K. Schneider, M. Ludwig, R. Aidam, R. Quay, M. Schlectweg, and G. Weimann, "InP DHBT-based monolithically integrated CDR/DEMUX IC operating at 80 Gb/s," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 2215–2223, October 2006. - [12] J. Savoj and B. Razavi, "A 10-Gb/s CMOS clock and data recovery circuit with a half-rate linear phase detector," *IEEE Journal of Solid-State Circuits*, vol. 36, pp. 761–768, May 2001. - [13] C. Kromer, G. Sialm, C. Menolfi, M. Schmatz, F. Ellinger, and H. Jackel, "A 25-Gb/s CDR in 90-nm CMOS for high-density interconnects," *IEEE Journal* of Solid-State Circuits, vol. 41, pp. 2921–2929, December 2006. - [14] T. Toifl, C. Menolfi, P. Buchmann, C. Hagleitner, M. Kossel, T. Morf, J. Weiss, and M. Schmatz, "A 72mW 0.003mm<sup>2</sup> inductorless 40Gb/s CDR in 65nm SOI CMOS," in *International Conference of Solid-State Circuits Conference (IC-SSCC)*. Digest of Technical Papers., pp. 226–228, IEEE, February 11-15 2007. - [15] L.-C. Cho, C. Lee, and S.-I. Liu, "A 33.6-to-33.8Gb/s burst-mode CDR in 90nm CMOS," in *International Solid-State Circuits Conference (ISSCC)*. Digest of Technical Papers., pp. 48–50, IEEE, February 11-15 2007. - [16] S. Palermo, A. Emami-Neyestanak, and M. Horowitz, "A 90nm CMOS 16Gb/s transceiver for optical interconnects," in *International Solid-State Circuits Conference (ISSCC)*. Digest of Technical Papers., pp. 44–46, IEEE, February 11-15 2007. - [17] J. Lee and M. Liu, "A 20Gb/s burst-mode CDR circuit using injection-locking technique," in *International Solid-State Circuits Conference (ISSCC)*. Digest of Technical Papers., pp. 46–48, IEEE, February 11-15 2007. - [18] N. Nedovic, N. Tzartzanis, H. Tamura, F. M. Rotella, M. Wiklund, Y. Mizutani, Y. Okaniwa, T. Kuroda, J. Ogawa, and W. W. Walker, "A 40-44 Gb/s 3x oversampling CMOS CDR/1:16 DEMUX," *IEEE Journal of Solid-State Circuits*, vol. 42, pp. 2726–2735, December 2007. [19] H. Noguchi, K. Hosoya, R. Ohhira, H. Uchida, A. Noda, N. Yoshida, and S. Wada, "A 35-to-46-Gb/s ultra-low jitter clock and data recovery circuit for optical fiber transmission systems," in *Compound Semiconductor Integrated Circuit (CSIC) Symposium*, pp. 1-4, IEEE, October 14-17 2007. - [20] T. Morikawa, M. Soda, S. Shioirl, T. Hashimoto, F. Sato, and K. Emura, "A SiGe single-chip 3.3 V receiver IC for 10 Gb/s optical communication system," in 1999 IEEE International Solid-State Circuits Conference (ISSCC). Digest of Technical Papers, pp. 380–381, IEEE, February 1999. - [21] Y. M. Greshishchev and P. Schvan, "SiGe clock and data recovery IC with linear-type PLL for 10-Gb/s SONET application," *IEEE Journal of Solid-State Circuits*, vol. 35, pp. 1353–1358, September 2000. - [22] Y. M. Greshishchev, P. Schvan, J. L. Showell, M.-L. Xu, J. J. Ojha, and J. E. Rogers, "A fully integrated SiGe receiver IC for 10-Gb/s data rate," *IEEE Journal of Solid-State Circuits*, vol. 35, pp. 1949–1957, December 2000. - [23] M. Meghelli, B. Parker, H. Ainspan, and M. Soyuer, "SiGe BiCMOS 3.3-V clock and data recovery circuits for 10-Gb/s serial transmission systems," *IEEE Journal of Solid-State Circuits*, vol. 35, pp. 1992–1995, December 2000. - [24] S. Ueno, T. Harada, K. Watanabe, T. Kato, T. Shinohara, K. Mikami, T. Hashimoto, A. Takai, K. Washio, and R. Takeyari, "A single-chip 10 Gb/s transceiver LSI using SiGe SOI/BiCMOS," in 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, pp. 82–83, IEEE, February 7 2001. - [25] D. Friedman, M. Meghelli, B. Parker, J. Yang, H. Ainspan, and M. Soyuer, "A single-chip 12.5 Gbaud transceiver for serial data communication," in 2001 Symposium on VLSI Circuits. Digest of Technical Papers., pp. 145–148, IEEE, June 2001. - [26] H. Nosaka, E. Sano, K. Ishii, M. Ida, K. Kurishima, T. Enoki, and T. Shibata, "A fully integrated 40-Gbit/s clock and data recovery circuit using InP/InGaAs HBTs," in 2002 IEEE MTT-S International Microwave Symposium Digest, pp. 83-86, IEEE, June 7 2002. - [27] H. Noguchi, T. Tateyama, M. Okatomo, H. Uchida, M. Kimura, and K. Takahashi, "A 9.9G-10.8Gb/s rate-adaptive clock and data-recovery with no external reference clock for WDM optical fiber transmission," in 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, pp. 202–472, IEEE, February 7 2002. - [28] J. E. Rogers and J. R. Long, "A 10-Gb/s CDR/DEMUX with LC delay line VCO in 0.18-μm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 37, pp. 1781–1789, 2002. - [29] J. Yen, M. G. Case, S. Nielsen, J. E. Rogers, N. K. Srivastava, and R. Thiagara-jah, "A fully integrated 43.2 Gb/s clock and data recovery and 1:4 DEMUX IC in InP HBT technology," in *IEEE International Solid-Sate Circuits Conference (ISSCC)*. Digest of technical papers., pp. 240–491, IEEE, February 13 2003. - [30] K. Krishnamurthy, I. Gontijo, S. Vu, C. Winczewski, Y. zen Liu, R. Pullela, M. Rodwell, R. Vetury, J. Xu, A. Shou, S. Jaganathan, K. Cheng, J. Chow, D. Mensa, and L. Zhang, "40 Gb/s TDM system using InP HBT IC technology," in 2003 IEEE MTT-S International Microwave Symposium Digest, pp. 1189–1192, IEEE, June 8-13 2003. - [31] F. Centurelli, A. Golfarelli, J. Guinea, L. Masini, D. Morigi, M. Pozzoni, G. Scotti, and A. Trifiletti, "A 10 Gb/s CDR in SiGe BiCMOS commercial technology with multistandard capability," in *Radio Frequency Integrated Circuits (RFIC) Symposium*, pp. 317–320, IEEEE, June 10 2003. - [32] A. Ong, S. Benyamin, J. Cancio, V. Condito, T. Labrie, J. P. Mattia, D. K. Shaeffer, A. Shahani, X. Si, H. Tao, M. Tarsia, W. Wong, and M. Xu, "A 40-43-Gb/s clock and data recovery IC with integrated SFI-5 1:16 demultiplexer in SiGe technology," *IEEE Journal of Solid-State Circuits*, vol. 38, pp. 2155–2168, December 2003. - [33] M. Ramezani, C. Andre, and T. Salama, "A 10Gb/s CDR with a half-rate bangbang phase detector," in *Proceedings of the 2003 International Symposium on Circuits and Systems (ISCAS)*, vol. 2, pp. II–II, IEEE, May 28 2003. - [34] A. Rezayee and K. Martin, "A 9-16Gb/s clock and data recovery circuit with three-state phase detector and dual-path loop architecture," in *Proceedings of the 29th European Solid-State Circuits Conference (ESSCIRC)*, pp. 683–686, IEEE, September 18 2003. - [35] B. Wu, Y.-H. Sutu, K. Ramamurthy, D. Zheng, E. Cheung, T. Tran, Y. Jiang, and M. Rana, "A serial 10 gigabit Ethernet transceiver on digital 0.13μm CMOS," in Proceedings of the 29th European Solid-State Circuits Conference (ESSCIRC), pp. 197–200, IEEE, September 18 2003. - [36] R. Kreienkamp, U. Langmann, C. Zimmermann, and T. Aoyama, "A 10-Gb/s CMOS clock and data recovery circuit with an analog phase interpolator," in Proceedings of the IEEE 2003 Custom Integrated Circuits Conference, pp. 73–76, IEEE, September 24 2003. - [37] W. Rhee, H. Ainspan, S. Rylov, A. Rylyakov, M. Beakes, D. Friedman, S. Gowda, and M. Soyuer, "A 10-Gb/s CMOS clock and data recovery circuit using a secondary delay-locked loop," in *Proceedings of the IEEE 2003 Custom Integrated Circuits Conference*, pp. 81–84, IEEE, September 24 2003. - [38] J. Lee and B. Razavi, "A 40-Gb/s clock and data recovery circuit in 0.18- $\mu$ m CMOS technology," *IEEE Journal of Solid-State Circuits*, vol. 38, pp. 2181–2190, December 2003. - [39] L. Henrickson, P. Wu, S. Quadri, D. Crosbie, D. Shen, U. Nellore, A. Ellis, J. Oh, H. Wang, G. Capriglione, A. Atesoglu, and A. Yang, "Low-power fully integrated 10-Gb/s SONET/SDH transceiver in 0.13-\mu CMOS," *IEEE Journal of Solid-State Circuits*, vol. 38, pp. 1595–1601, October 2003. [40] H. Nosaka, K. Ishii, T. Enoki, and T. Shibata, "A 10-Gb/s data-pattern independent clock and data recovery circuit with a two-mode phase comparator," IEEE Journal of Solid-State Circuits, vol. 38, pp. 192–197, February 2003. - [41] S. Kaeriyama and M. Mizuno, "A 10Gb/s 50mW 120×130μm<sup>2</sup> clock and data recovery circuit," in 2003 IEEE International Solid-State Circuits Conference (ISSCC). Digest of Technical Papers., pp. 470–478, IEEE, February 13 2003. - [42] B.-J. Lee, M.-S. Hwang, S.-H. Lee, and D.-K. Jeong, "A 2.5-10-Gb/s CMOS transceiver with alternating edge-sampling phase detection for loop characteristic stabilization," *IEEE Journal of Solid-State Circuits*, vol. 38, pp. 1821–1829, November 2003. - [43] T.-S. Chen, Y.-B. Luo, and L.-R. Huang, "A 10 Gb/s clock and data recovery circuit with binary phase/frequency detector using TSMC 0.35 μm SiGe BiCMOS process," in The 2004 IEEE Asia-Pacific Conference on Circuits and Systems. Proceedings., pp. 981–984, IEEE, December 9 2004. - [44] J. Takasoh, T. Yoshimura, H. Kondoh, and N. Higashisaka, "A 12.5Gbps half-rate CMOS CDR circuit for 10Gbps network applications," in Symposium on VLSI Circuits. Digest of Technical Papers, pp. 268–271, IEEE, June 19 2004. - [45] Y. Ohtomo, T. Kawamura, K. Nishimura, M. Nogawa, H. Koizumi, and M. To-gashi, "A 12.5Gb/s CMOS BER test using a jitter-tolerant parallel CDR," in *International Solid-State Circuits Conference (ISSCC)*, pp. 174–520, IEEE, February 19 2004. - [46] H. Werker, M. Vena, A. Melodia, J. Fisher, G. de Mercey, H. Geib, S. Meching, C. Holuigue, C. Ebner, G. Mitteregger, E. Romani, F. Roger, T. Blon, and M. Moyal, "A 10 Gb/s SONET-compliant CMOS transceiver with low cross-talk and intrinsic jitter," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 2349–2358, December 2004. - [47] H. S. Muthali, T. P. Thomas, and I. A. Young, "A CMOS 10-gb/s SONET transceiver," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1026–1033, July 2004. - [48] Z. Gu and A. Thiede, "10 GHz full-rate clock and data recovery circuit in 0.18 $\mu$ m CMOS without external reference clock," *Electronics Letters*, vol. 40, p. 25, December 2004. - [49] X. Chen and M. M. Green, "A CMOS 10 Gb/s clock and data recovery circuit with a novel adjustable $k_{pd}$ phase detector," in *Proceedings of the 2004 International Symposium on Circuits and Systems (ISCAS)*, pp. IV–301, IEEE, May 26 2004. - [50] Z. Lao, K. Guinn, M. Delaney, J. Jensen, M. Sokolich, S. Thomas, and C. Fields, "A packaged 43-Gb/s clock and data recovery IC," in *Compound Semiconductor Integrated Circuit Symposium (CSICS)*, 2005. [51] F. Centurelli, A. Golfarelli, J. Guinea, L. Masini, D. Morigi, M. Pozzoni, G. Scotti, and A. Trifiletti, "A 10-Gb/s CMU/CDR chip-set in SiGe BiCMOS commercial technology with multistandard capability," *IEEE transactions on very large scale integration (VLSI) systems*, vol. 13, pp. 191–200, February 2005. - [52] J.-H. C. Zhan, J. S. Duster, and K. T. Kornegay, "Full-rate injection-locked $10.3 \,\mathrm{Gb/s}$ clock and data recovery circuit in a $45 \,\mathrm{GHz}$ - $f_T$ SiGe process," in *Proceedings of the IEEE Custom Integrated Circuits Conference*, pp. 552–555, IEEE, September 2005. - [53] D. Kucharski and K. T. Kornegay, "A 43-45Gb/s 2.5V integrated clock and data recovery circuit in SiGe using low-voltage topologies," in *Proceedings of the Bipolar/BiCMOS Circuits and Technology Meeting*, pp. xxiii+287, IEEE, October 11 2005. - [54] M. Nogawa, K. Nishimura, S. Kimura, T. Yoshida, T. Kawamura, M. Togashi, K. Kumozaki, and Y. Ohtomo, "A 10 Gb/s burst-mode CDR IC in 0.13 μm CMOS," in *International Solid-State Circuits Conference (ISSCC)*. Digest of Technical Papers., vol. 2, pp. 228–595, IEEE, February 10 2005. - [55] Y. Tomita, M. Kibune, J. Ogawa, W. W. Walker, H. Tamura, and T. Kuroda, "A 10-Gb/s receiver with series equalizer and on-chip ISI monitor in 0.11-μm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 40, pp. 986–993, April 2005. - [56] R. Kreienkamp, U. Langmann, C. Zimmerman, T. Aoyama, and H. Siedhoff, "A 10-gb/s CMOS clock and data recovery circuit with an analog phase interpolator," *IEEE Journal of Solid-State Circuits*, vol. 40, pp. 736–743, March 2005. - [57] T.-S. Chen, "A 10 Gb/s CMOS half-rate clock and data recovery circuit with direct bang-bang tuning," in *International Workshop on Radio-Frequency Integration Technology: Integrated Circuits for Wideband Communication and Wireless Sensor Networks. Proceedings.*, pp. 57–60, IEEE, December 2 2005. - [58] D. Kucharski and K. T. Kornegay, "2.5 V 43-45 gb/s CDR circuit and 55 gb/s PRBS generator in sige using a low-voltage logic family," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 2154–2165, September 2006. - [59] C.-F. Liang, S.-C. Hwu, and S.-I. Liu, "A 10Gbps burst-mode CDR circuit in 0.18 $\mu$ m CMOS," in *Custom Integrated Circuits Conference*, pp. 599–602, IEEE, September 10-13 2006. - [60] S. Gondi and B. Razavi, "A 10-Gb/s CMOS merged adaptive equalizer/CDR circuit for serial-link receivers," in Symposium on VLSI Circuits. Digest of Technical Papers., pp. 194–195, IEEE, June 15-17 2006. - [61] S. Byun, J. C. Lee, J. H. Shim, K. Kim, and H.-K. Yu, "A 10-Gb/s CMOS CDR and DEMUX IC with a quarter-rate linear phase detector," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 2566–2576, November 2006. [62] J. G. Kenney, D. Dalton, E. Evans, M. H. Eskiyerli, B. Hilton, D. Hitchcox, T. Kwok, D. Mulcahy, C. McQuilkin, V. Reddy, S. Selvanayagam, P. Shepherd, W. S. Titus, and L. DeVito, "A 9.95-11.3-Gb/s XFP transceiver in 0.13-μm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 41, p. 29012910, December 2006. - [63] Y. Ohtomo, K. Nishimura, and M. Nogawa, "A 12.5-gb/s parallel phase detection clock and data recovery circuit in 0.13-μm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 2052–2057, 2006. - [64] A. Momtaz, D. Chung, N. Kocaman, J. Cao, M. Caresosa, B. Zhang, and I. Fujimori, "A fully integrated 10-Gb/s receiver with adaptive optical dispersion equalizer in 0.13-μm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 42, pp. 872–880, April 2007. - [65] J. Li and J. Silva-Martinez, "A fully on-chip 10Gb/s CDR in a standard 0.18 $\mu$ m CMOS technology," in *Radio Frequency Integrated Circuits (RFIC) Symposium*, pp. 237–240, IEEE, June 3-5 2007. - [66] K. Sano, K. Murata, S. Sugitani, H. Sugahara, and T. Enoki, "50-Gb/s 4-b multiplexer/demultiplexer chip set using InP HEMT," *IEEE Journal of Solid-State Circuits*, vol. 38, pp. 1504–1511, September 2003. - [67] K. Murata, K. Sano, H. Kitabayashi, S. Sugitani, H. Sugahara, and T. Enoki, "100-Gb/s multiplexing and demultiplexing IC operations in InP HEMT technology," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 207–213, January 2004. - [68] K. Sano, H. Fukuyama, K. Murata, K. Kurishima, N. Kashio, T. Enoki, and H. Sugahara, "Up to 80-Gbit/s operations of 1:4 demultiplexer IC with InP HBTs," in Compound Semiconductors Integrated Circuit Symposium (CSICS) Technical Digest, pp. 264–267, IEEE, October 30-November 2 2005. - [69] K. Ishii, H. Nosaka, K. Sano, K. Murata, M. Ida, K. Kurishima, M. Hirata, T. Shibata, and T. Enoki, "High-bit-rate low-power decision circuit using InP-InGaAs HBT technology," *IEEE Journal of Solid-State Circuits*, vol. 40, pp. 1583–1588, July 2005. - [70] K. Ishii, K. Sano, K. Murata, M. Ida, K. Kurishima, T. Shibata, T. Enoki, and H. Sugahara, "90 Gbit/s 0.5 W decision circuit using InP/InGaAs double heterojunction bipolar transistors," *IEEE Electronics Letters*, vol. 40, pp. 1020–1021, August 5 2004. - [71] Y. Suzuki, M. Mamada, and Z. Yamazaki, "Over-100-Gb/s 1;2 demultiplexer based on InP HBT technology," *IEEE Journal of Solid-State Circuits*, vol. 42, pp. 2594–2599, November 2007. - [72] C. Kärnfelt, J. Hallin, T. Kjellberg, B. Hansson, and T. Swahn, "Flip-chip mounted 1:4 demultiplexer IC in InP DHBT technology operating up to 100 Gb/s." To be published, 2007. - [73] J. Hallin, T. Kjellberg, and T. Swahn, "A 100-Gb/s 1:4 demultiplexer in InP DHBT technology," in *Compound Semiconductor Integrated Circuit Symposium* (CSICS), pp. 227–230, IEEE, November 12-15 2006. - [74] J. Hallin, T. Kjellberg, and T. Swahn, "A 165-Gb/s 4:1 multiplexer in InP DHBT technology," *IEEE Journal of Solid-State Circuits*, vol. 41, pp. 2209–2214, October 2006. - [75] T. Swahn, J. Hallin, and T. Kjellberg, "Design and test of InP DHBT ICs for a 100 Gb/s demonstrator system," in *International Indium Phosphide and Related Materials Conference Proceedings*, pp. 79–84, IEEE, 7-11 May 2006. - [76] T. Swahn, T. Kjellberg, and J. Hallin, "100-165 Gb/s InP DHBT integrated circuits for a transceiver demonstrator," in 7th Topical Workshop on Heterostructure Microelectronics (TWHM), pp. 115-116, IEICE & IEEE, August 21-24 2007. - [77] N. Behjou, "Design and simulation of very high speed frequency dividers in InP-DHBT IC technology," Master's thesis, Chalmers University of Technology, 2004. - [78] E. A. Sovero and B. Li, "Monolithic InP HBT W-band VCO-static divider," in 2004 IEEE MTT-S International Microwave Symposium Digest, vol. 3, pp. 1325–1328, IEEE, 6-11 June 2004. - [79] M. Rodwell, Z. Griffith, V. Paidi, N. Parthasarathy, C. Sheldon, U. Singisetti, M. Urteaga, R. Pierson, and B. Brar, "InP HBT digital ICs and MMICs in the 140-220 ghz band," in *The Joint 30th International Conference on Infrared and Millimeter Waves and 13th International Conference on Terahertz Electronics*, vol. 2, pp. 620-621, September 19-23 2005. - [80] R.-E. Makon, K. Schneider, R. Driad, M. Lang, R. Aidam, R. Quay, and G. Weimann, "Fundamental low phase noise InP-base DHBT VCOs with high output power operating up to 75 GHz," in *Compound Semiconductor Integrated Circuit Symposium*, 2004, pp. 159–162, IEEE, 24-27 October 2004. - [81] R. Makon, R. Driad, K. Schneider, H. Massler, R. Aidam, R. Quay, M. Schlechtweg, and G. Weimann, "Fundamental low phase noise InP-based DHBT VCO operating up to 89 GHz," *Electronic Letters*, vol. 41, pp. 37–38, 18 August 2005. - [82] K. W. Kobayashi, A. K. Oki, L. T. Tran, J. C. Cowles, A. Gutierrez-Aitken, F. Yamada, T. R. Block, and D. C. Streit, "A 108-GHz InP-HBT monolithic push-push VCO with low phase noise and wide tuning bandwidth," *IEEE Journal of Solid-State Circuits*, vol. 34, pp. 1225–1232, September 1999. - [83] K. Uchida, I. Aoki, H. Matsuura, T. Yakihara, S. Kobayashi, S. Oka, T. Fujita, and A. Miura, "104 and 134 GHz InGaP/InGaAs HBT oscillators," in Gallium Arsenide Integrated Circuit (GaAs IC) Symposium, pp. 237–240, IEEE, October 17-20 1999. [84] Y. Baeyens, C. Dorschky, N. Weimann, Q. Lee, R. Kopf, G. Georgiou, J.-P. Mattia, R. Hamm, and Y.-K. Chen, "Compact InP-based HBT VCOs with a wide tuning range at W-band and D-band," *IEEE Transactions on Microwave Theory and Techniques*, vol. 48, pp. 2403–2408, December 2000. - [85] Y. Bayens and Y. K. Chen, "A monolithic integrated 150 GHz SiGe HBT push-push VCO with simultaneous differential V-band output," in 2003 IEEE MTT-S International Microwave Symposium Digest, vol. 2, pp. 877–880, IEEE, June 8-13 2003. - [86] F. Lenk, M. Schott, J. Hilsenbeck, J. Wurfl, and W. Heinrich, "Low phase-noise monolithic GaInP/GaAs-HBT VCO for 77 GHz," in 2003 IEEE MTT-S International Microwave Symposium Digest, vol. 2, pp. 903–906, IEEE, June 8-13 2003. - [87] H. Li, H.-M. Rein, , and T. Suttorp, "Design of W-band VCOs with high output power for potential application in 77 GHz automotive radar systems," in *Gallium Arsenide Integrated Circuit (GaAs IC) Symposium*, pp. 263–266, IEEE, November 9-12 2003. - [88] H. Li, H.-M. Rein, and M. Schwerd, "SiGe VCOs operating up to 88 GHz, suitable for automative radar sensors," *Electronics Letters*, vol. 39, pp. 1326– 1327, September 4 2003. - [89] H. Li, H.-M. Rein, T. Suttorp, and J. Böck, "Fully integrated SiGe VCOs with powerful output buffer for 77-GHz automotive radar systems and applications around 100 GHz," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1650–1658, October 2004. - [90] Z. Lao, J. Jensen, K. Guinn, and M. Sokolich, "80-GHz differential VCO in InP SHBTs," *IEEE Microwave and Wireless Components Letters*, vol. 14, pp. 407–409, September 2004. - [91] H. S. Yourke, "Millimicrosecond transistor current switching," *IRE Transactions on Circuit Theory*, vol. 4, no. 3, pp. 236–240, 1957. - [92] J. J. Ebers and J. L. Moll, "Large-signal behavior of junction transistors," *Proceedings of the IRE*, vol. 42, pp. 1761–1772, December 1954. - [93] H.-M. Rein and M. Möller, "Design considerations for very-high-speed Sibipolar IC's operating up to 50 Gb/s," *IEEE Journal of Solid-State Circuits*, vol. 31, pp. 1076–1090, August 1996. - [94] S. S. Mohan, M. del Mar Hershenson, S. P. Boyd, and T. H. Lee, "Bandwidth extension in CMOS with optimized on-chip inductors," *IEEE Journal of Solid-State Circuits*, vol. 35, pp. 346–355, March 2000. - [95] J.-C. Chien and L.-H. Lu, "A 20-Gb/s 1:2 demultiplexer with capacitive-splitting current-mode-logic latches," *IEEE Transactions on Microwave Theory and Techniques*, vol. 55, pp. 1624–1632, August 2007. - [96] W. Liu, Fundamental of III-V devices: HBTs, MESFETS, and HFETs/HEMTs. John Wiley & Sons, Inc., 1999. - [97] J. S. Yuan, SiGe, GaAs, and InP heterojunction bipolar transistors. John Wiley & Sons, Inc., 1999. - [98] D. A. Johns and K. W. Martin, Analog integrated circuit design. John Wiley & Sons, Inc., 2nd ed., 1997. - [99] H. Zirath, R. Kozhuharov, and M. Ferndahl, "A x2 coupled Colpitt VCO with ultra low phase noise," in *Compound Semiconductor Integrated Circuit Symposium (CSICS)*, pp. 155–158, IEEE, 24-37 October 2004. - [100] D. B. Leeson, "A simple model of feedback oscillator noise spectrum," *Proceedings of the IEEE*, vol. 54, pp. 329–330, February 1966. - [101] X. Zhang, D. Sturzebecher, and A. Daryoush, "Comparison of the phase noise performance of HEMT and HBT based oscillators," in 1995 IEEE MTT-S International Microwave Symposium Digest, vol. 2, pp. 697–700, IEEE, 16-20 May 1995. - [102] H. Li and H.-M. Rein, "Millimeter-wave VCOs with wide tuning range and low phase noise, fully integrated in a SiGe bipolar production technology," *IEEE Journal of Solid-State Circuits*, vol. 38, pp. 184–191, February 2003. - [103] H. Li, H.-M. Rein, R.-E. Makon, and M. Schwerd, "Wideband VCOs in SiGe production technologyoperating up to about 70 GHz," *IEEE Microwave and Wireless Components Letters*, vol. 13, pp. 425–427, October 2003. - [104] L. Zhang, R. Pullela, C. Winczewski, J. Chow, D. Mensa, S. Jaganathan, and R. Yu, "A 37-50GHz InP HBT VCO IC for OC-768 fiber optic communication applications," in *Radio Frequency Integrated Circuits (RFIC) Symposium*, pp. 85–88, IEEE, 2-4 june 2002. - [105] H. Zirath, T. Masuda, R. Kozhuharov, and M. Ferndahl, "Development of 60-GHz front-end circuits for a high-data-rate communication system," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1640–1649, October 2004. - [106] F. M. Gardner, Phaselock techniques. John Wiley & Sons, Inc., 3rd ed., 2005. - [107] B. Razavi, RF microelectronics. Prentice-Hall, Inc., 1998. - [108] T. H. Lee, The design of CMOS radio-frequency integrated circuits. Cambridge University Press, 1998. - [109] S. Cheng, H. Tong, J. Silva-Martinez, and A. I. Karsilayan, "Steady state analysis of phase-locked loops using binary phase detector," *Transactions on circuits and systems II: Express briefs*, vol. 54, pp. 474–478, June 2007. - [110] P. R. Gray and R. G. Meyer, Analysis and design of analog integrated circuits. John Wiley & Sons, Inc., 3rd ed., 1993. [111] H. L. Krauss, C. W. Bostian, and F. H. Raab, Solid state radio engineering. John Wiley & Sons, Inc, 1980. - [112] ITU-T, "G.958 digital line systems based on the synchronous digital hierarchy for use on optical fibre cables digital sections and digital line systems (study group 15)," January 1 1994. - [113] G. P. Agrawal, Fiber-optic communication systems. John Wiley & Sons, Inc., 1992. - [114] C. D. Holdenried, J. W. Haslett, and M. W. Lynch, "Analysis and design of HBT Cherry-Hooper amplifiers with emitter-follower feedback for optical communications," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1959–1967, November 2004. - [115] T. Kjellberg, J. Hallin, and T. Swahn, "104Gb/s 2e11-1 and 110Gb/s 2e9-1 PRBS generator in InP HBT technology," in *International Solid-State Circuits Conference (ISSCC)*. Digest of Technical Papers., pp. 2160–2169, IEEE, February 6-9 2006. - [116] R. E. Collin, Foundations for microwave engineering. McGraw-Hill Book Co., 2nd ed., 1992. ### Appendix A ## Various Calculations Some calculations referred to in the thesis have been placed in this appendix. This enhances the readability by avoiding digressions. ### A.1 Differential stage A differential stage is shown in fig. A.1. The output voltage is a function of the input voltage. The three current equations can be stated as: $$I_A = \frac{I_S}{\beta + 1} \left\{ \exp\left(\frac{V_A - V_{CC}}{V_T}\right) - 1 \right\}$$ (A.1) $$I_B = \frac{I_S}{\beta + 1} \left\{ \exp\left(\frac{V_B - V_{CC}}{V_T}\right) - 1 \right\}$$ (A.2) $$I_{CC} = (\beta + 1)(I_A + I_B)$$ (A.3) The differential output voltage is: $$V_{out} = V_{AA} - V_{BB} = R(I_{BB} - I_{AA}) = R\beta(I_B - I_A)$$ (A.4) These four equations are used to derive an expression for $V_{out}$ as a function of $V_{in}$ . $V_{CC}$ is isolated: $$V_{CC} = V_A - V_T \ln \left( \frac{I_A (\beta + 1)}{I_S} + 1 \right)$$ (A.5) $V_{CC}$ is inserted: $$I_{CC} = (\beta + 1)(I_A + I_B)$$ (A.6) $$V_{out} = R\beta \left( I_B - I_A \right) \tag{A.7}$$ 148 Appendix A Figure A.1: Differential stage with marked with currents and voltages. $$I_{B} = \frac{I_{S}}{\beta + 1} \left\{ \left( \frac{I_{A} (\beta + 1)}{I_{S}} + 1 \right) \exp \left( \frac{V_{B} - V_{A}}{V_{T}} \right) - 1 \right\}$$ (A.8) $I_A$ is isolated: $$I_A = \frac{I_{CC}}{\beta + 1} - I_B \tag{A.9}$$ $I_A$ is inserted: $$V_{out} = R\beta \left(2I_B - \frac{I_{CC}}{\beta + 1}\right) \tag{A.10}$$ $$I_{B} = \frac{I_{S}}{\beta + 1} \left\{ \left( \frac{I_{CC} - I_{B} (\beta + 1)}{I_{S}} + 1 \right) \exp \left( \frac{V_{B} - V_{A}}{V_{T}} \right) - 1 \right\}$$ (A.11) $I_B$ is isolated: $$I_B = \frac{V_{out}}{2R\beta} + \frac{I_{CC}}{2(\beta+1)} \tag{A.12}$$ $I_B$ is inserted: $$\frac{V_{out}}{2R\beta} + \frac{I_{CC}}{2(\beta+1)} = \frac{I_S}{\beta+1} \left\{ \left( \frac{I_{CC} - \left(\frac{V_{out}}{2R\beta} + \frac{I_{CC}}{2(\beta+1)}\right)(\beta+1)}{I_S} + 1 \right) \exp\left(\frac{V_B - V_A}{V_T}\right) - 1 \right\}$$ (A.13) This expression is reduced in several steps, isolating $V_{out}$ : $$\frac{V_{out}}{2R\beta} + \frac{I_{CC}}{2(\beta+1)} = \frac{I_S}{\beta+1} \left( \frac{I_{CC} - \left(\frac{V_{out}}{2R\beta} + \frac{I_{CC}}{2(\beta+1)}\right)(\beta+1)}{I_S} + 1 \right) \exp\left(\frac{V_B - V_A}{V_T}\right) - \frac{I_S}{\beta+1} \Rightarrow (A.14)$$ $$\frac{V_{out}}{2R\beta} + \frac{I_{CC}}{2(\beta+1)} = \frac{I_S}{\beta+1} \left( \frac{I_{CC}}{I_S} - \frac{\frac{V_{out}}{2R\beta} (\beta+1)}{I_S} - \frac{I_{CC}}{2I_S} + 1 \right) \exp\left( \frac{V_B - V_A}{V_T} \right) - \frac{I_S}{\beta+1} \Rightarrow \tag{A.15}$$ $$\frac{V_{out}}{2R\beta} + \frac{I_{CC}}{2(\beta+1)} = \left(\frac{I_{CC}}{\beta+1} - \frac{V_{out}}{2R\beta} - \frac{I_{CC}}{2(\beta+1)} + \frac{I_S}{\beta+1}\right) \exp\left(\frac{V_B - V_A}{V_T}\right) - \frac{I_S}{\beta+1} \Rightarrow \tag{A.16}$$ $$\frac{V_{out}}{2R\beta} + \frac{I_{CC}}{2(\beta+1)} = \left(\frac{I_{CC}}{2(\beta+1)} - \frac{V_{out}}{2R\beta} + \frac{I_S}{\beta+1}\right) \exp\left(\frac{V_B - V_A}{V_T}\right) - \frac{I_S}{\beta+1} \Rightarrow \text{ (A.17)}$$ $$\frac{V_{out}}{2R\beta} + \frac{V_{out}}{2R\beta} \exp\left(\frac{V_B - V_A}{V_T}\right) = \left(\frac{I_{CC}}{2(\beta+1)} + \frac{I_S}{\beta+1}\right) \exp\left(\frac{V_B - V_A}{V_T}\right) - \frac{I_S}{\beta+1} - \frac{I_{CC}}{2(\beta+1)} \Rightarrow (A.18)$$ $$V_{out} = \frac{2R\beta}{\exp\left(\frac{V_B - V_A}{V_T}\right) + 1} \left\{ \left(\frac{I_{CC}}{2(\beta + 1)} + \frac{I_S}{\beta + 1}\right) \exp\left(\frac{V_B - V_A}{V_T}\right) - \frac{I_S}{\beta + 1} - \frac{I_{CC}}{2(\beta + 1)} \right\} \Rightarrow (A.19)$$ $$V_{out} = \frac{2R\beta}{\exp\left(\frac{V_B - V_A}{V_T}\right) + 1} \left(\frac{I_{CC}}{2(\beta + 1)} + \frac{I_S}{\beta + 1}\right) \left(\exp\left(\frac{V_B - V_A}{V_T}\right) - 1\right) \Rightarrow \quad (A.20)$$ $$V_{out} = \frac{R\beta \left(I_{CC} + 2I_S\right)}{(\beta + 1)} \times \frac{\exp\left(\frac{V_B - V_A}{V_T}\right) - 1}{\exp\left(\frac{V_B - V_A}{V_T}\right) + 1} \Rightarrow \tag{A.21}$$ $$V_{out} = \frac{R\beta \left(I_{CC} + 2I_S\right)}{(\beta + 1)} \times \tanh\left(\frac{V_B - V_A}{V_T}\right) \Rightarrow$$ (A.22) 150 Appendix A $$V_{out} = \frac{R\beta \left(I_{CC} + 2I_S\right)}{(\beta + 1)} \times \tanh\left(\frac{-V_{in}}{V_T}\right) \tag{A.23}$$ where: $$V_{in} = V_A - V_B \tag{A.24}$$ ### A.2 Inductive line A transmission with a load is shown in fig. A.2 The task is to determine the voltage Figure A.2: Loaded transmission line. at the load, $V_L$ , as a function of the input voltage, $V_S$ . The transmission line is assumed to be ideal (without losses) to simplify the example. To satisfy the boundary conditions, $V_S$ must be the sum of the incident wave, $V^+$ , and the reflected wave, $V^-$ [116]: $$V_S = V^+ e^{j\beta l} + V^- e^{-j\beta l} = V^+ e^{j\beta l} \left( 1 + \Gamma_L e^{-2j\beta l} \right) \tag{A.25}$$ The reflection coefficient at the load, $\Gamma_L$ , has been given previously: $$\Gamma_L = \frac{Z_L - Z_0}{Z_L + Z_0} \tag{A.26}$$ $V^+$ is isolated: $$V^{+} = \frac{V_S}{e^{j\beta l} + \Gamma_L e^{-j\beta l}} = \frac{V_S (Z_L + Z_0)}{2 (Z_L \cos \beta l + j Z_0 \sin \beta l)}$$ (A.27) $V_L$ is the sum of $V^+$ and $V^-$ : $$V_L = V^+ + V^- \Rightarrow \tag{A.28}$$ $$V_L = V^+ (1 + \Gamma_L) \Rightarrow \tag{A.29}$$ $$V_L = \frac{V_S \left( Z_L + Z_0 \right)}{2 \left( Z_L \cos \beta l + j Z_0 \sin \beta l \right)} \left( 1 + \frac{Z_L - Z_0}{Z_L + Z_0} \right) \Rightarrow \tag{A.30}$$ $$V_L = \frac{V_S \left( Z_L + Z_0 \right)}{2 \left( Z_L \cos \beta l + j Z_0 \sin \beta l \right)} \left( \frac{2Z_L}{Z_L + Z_0} \right) \Rightarrow \tag{A.31}$$ $$V_L = V_S \frac{Z_L}{Z_L \cos \beta l + i Z_0 \sin \beta l} \Rightarrow \tag{A.32}$$ $$V_L = V_S \frac{1}{\cos \beta l + j \frac{Z_0}{Z_L} \sin \beta l} \tag{A.33}$$ Unsurprisingly; $$V_L \propto V_S$$ (A.34) #### A.2.1 Effective inductance The next step is to examine how a transmission line can effectively act as an inductor. Two simple tanks are presented in figs. A.3 & A.4. $V_L$ for the LC tank is derived Figure A.3: LC-tank. Figure A.4: LC-tank with a transmission line acting as the inductive element. effortlessly by voltage division: $$V_L = V_S \frac{\frac{1}{j\omega C}}{j\omega L + \frac{1}{j\omega C}} \Rightarrow \tag{A.35}$$ $$V_L = V_S \frac{1}{j^2 \omega^2 LC + 1} \Rightarrow \tag{A.36}$$ $$V_L = V_S \frac{\frac{1}{LC}}{\frac{1}{LC} - \omega^2} \Rightarrow \tag{A.37}$$ $$V_L = V_S \frac{\frac{1}{LC}}{\left(\sqrt{\frac{1}{LC}} - \omega\right)\left(\sqrt{\frac{1}{LC}} + \omega\right)}$$ (A.38) The tank will oscillate at: $$\omega_{osc} = \sqrt{\frac{1}{LC}} \tag{A.39}$$ The second tank contains a transmission line loaded with a capacitor. The load is: 152 Appendix A $$Z_L = \frac{1}{j\omega C} \tag{A.40}$$ This is inserted in eq. A.33: $$V_L = V_S \frac{1}{\cos \beta l + j \frac{Z_0}{(j\omega C)^{-1}} \sin \beta l} \Rightarrow \tag{A.41}$$ $$V_L = V_S \frac{1}{\cos \beta l - Z_0 \omega C \sin \beta l} \tag{A.42}$$ This is not a polynomial, but the amplification approaches $\infty$ when the divisor approaches 0. This occurs at: $$\cos \beta l - Z_0 \omega C \sin \beta l = 0 \Rightarrow \tag{A.43}$$ $$\cos \beta l = Z_0 \omega C \sin \beta l \tag{A.44}$$ The conditions for oscillation in an LC-tank, eq. A.39, is used: $$\omega_{osc} = \sqrt{\frac{1}{LC}} \Rightarrow$$ (A.45) $$C = \frac{1}{\omega_{osc}^2 L} \tag{A.46}$$ The next step is to assume that the condition is eq. A.44 is fulfilled at $\omega_{osc}$ , and insert eq. A.46: $$\cos \beta l = \frac{Z_0 \omega}{\omega^2 L_{eff}} \sin \beta l \Rightarrow \tag{A.47}$$ $$L_{eff} = \frac{Z_0}{\omega} \frac{\sin \beta l}{\cos \beta l} \Rightarrow \tag{A.48}$$ $$L_{eff} = \frac{Z_0}{\omega} \tan \beta l \tag{A.49}$$ The transmission line is thus able to act as an inductor, and the effective inductance is given by eq. A.49. It is interesting to notice that: $$L_{eff} \propto Z_0$$ (A.50) Achieving a sufficiently high $Z_0$ for the inductor to be practical can be problematic. The second factor in eq. A.49 reveals that the transmission line becomes capacitive immediately beyond: $$\beta l = \pi/2 \Rightarrow$$ (A.51) $$\frac{2\pi}{\lambda}l = \pi/2 \Rightarrow \tag{A.52}$$ $$l = \lambda/4 \tag{A.53}$$ $L_{eff}$ grows exponentially close to $l = \lambda/4$ and thus become increasinly difficult to gauge accurately. This suggests that l should be kept safely below this range. A small example will clarify the possibilities. $Z_0{=}52~\Omega$ , which is the highest attainable value for a transmission line in VIP-2, see section 2.2.4. $\omega=6.28\times10^{11}~{\rm rad/s}$ , equivalent to a frequency of 100 GHz. The length of the transmission line is $l=\lambda/12$ (safely below $\lambda/4$ ), which corresponds to about 150 $\mu$ m for VIP-2. This yields: $$L_{eff} = \frac{52\Omega}{6.28 \times 10^{11} rad/s} \tan \frac{\pi}{6} = 48pF \tag{A.54}$$ It may not seem like much. A transistor driving a 0 to 4 mA current switch in 4 ps would create a temporary voltage over an inductor of: $$L_{eff}\dot{I} = 48pF \times \frac{4mA}{4ps} = 48mV \tag{A.55}$$ This is 16% of the 300 mV logical voltage level employed and a noticable improvement of transistion speed. The penalty is the ringing/overshot. The phase shift may also be undesireable, depending on the application. #### A.2.2 Negative reactance in the load $$V_L = V_S \frac{1}{\cos \beta l + j \frac{Z_0}{Z_L} \sin \beta l} \tag{A.56}$$ $$Z_0 = R_L \tag{A.57}$$ $$Z_L = \frac{R_L}{R_L C_L s + 1} \tag{A.58}$$ $$V_L = V_S \frac{1}{\cos \beta l + j \frac{Z_0(R_L C_L j\omega + 1)}{R_L} \sin \beta l}$$ (A.59) $$V_L = V_S \frac{1}{\cos \omega K l + i \left( R_L C_L j \omega + 1 \right) \sin \omega K l}, \ Z_0 = R_L \tag{A.60}$$ $$V_L = V_S \frac{1}{\cos \omega K l - R_L C_L \omega \sin \omega K l + j \sin \omega K l}$$ (A.61) $$V_L = V_S \frac{1}{\cos \beta l + j \frac{Z_0}{Z_I} \sin \beta l} \tag{A.62}$$ $$\beta = \omega \sqrt{LC} = \omega K \tag{A.63}$$ $$V_L = V_S \frac{1}{\cos \omega K l + j \frac{Z_0}{Z_L} \sin \omega K l}$$ (A.64) 154 Appendix A $$Z_0 = 50\Omega \tag{A.65}$$ $$Z_L = \frac{R_L}{R_L C_L s + 1} = \frac{1}{C_L} \times \frac{1}{s + \frac{1}{R_L C_L}} = \frac{1}{C_L} \times \frac{1}{j\omega + \frac{1}{R_L C_L}}$$ (A.66) $$V_L = V_S \frac{1}{\cos \omega K l + j \frac{Z_0(R_L C_L j \omega + 1)}{R_L} \sin \omega K l}$$ (A.67) $$V_L = V_S \frac{1}{\cos \omega K l + j \left( R_L C_L j \omega + 1 \right) \sin \omega K l}, \ Z_0 = R_L$$ (A.68) $$V_L = V_S \frac{1}{\cos \omega K l - R_L C_L \omega \sin \omega K l + j \sin \omega K l}$$ (A.69) ## A.3 Relative jitter generation in Alexander-type vs. double Alexander-type phase detectors ### Appendix B ### Various Simulations Some simulations referred to in the thesis have been placed in this appendix. This enhances the readability by avoiding digressions. ### B.1 HBT input capacitance Estimating transistor input capacitance is primarily an occupation of digital IC designers, who must be able to gauge fan-in and fan-out both quickly and accurately. None of the circuits presented in this thesis strictly requires such an approach, but the magnitude still has relevance for the resonance of the inductive line, the tank and parasitics of the VCO and the tuning of the static divider. The capacitance is frequency dependent, and the simplest way to determine it, is to drive the base with an ideal voltage source with a source resistor. The capacitance can then be calculated based on the amplitude of the base voltage, $|V_b|$ : $$C_b = \frac{\sqrt{1 - |V_b|^2}}{\omega |V_b| R_S} = \frac{\sqrt{1 - |V_b|^2}}{2\pi f |V_b| R_S}$$ (B.1) The input capacitance for a minimum transistor, with respect to VIP-2, is shown in fig. B.1. The capacitance is 7.28 fF at f=50 GHz. ## B.2 Practical range of characteristic impedance for transmission lines (microstrip) The transmission lines are commonly realized as microstrips. The only exceptions in this thesis are the inductive elements in some of the VCOs, where a CPW would ideally provide less noise. All chips utilize the principle of unbroken layers of M1 and M2 carrying $V_{SS}$ and $V_{GND}$ respectively. The layers are only opened up around active devices to minimize parasitic coupling. This confines the microstrips to M4 over M2 and M3 over M2. The characteristic impedance for these two combinations are shown in fig. B.2. The three possible configurations using M1 as the underlying substrate have been included as well. 156 Appendix B Figure B.1: Input capacitance of a minimum size transistor. Emitter width and length are 0.5 $\mu$ m×1.0 $\mu$ m. The microstrips can be modeled directly in e.g. Cadence or ADS using a substrate and a microstrip definition. The models can be wideband and include losses. However, they're limited to simple geometric descriptions. An example of a microstrip is shown in fig. B.3. The top dielectric is different from the intermediary dielectric, and it also wraps around the M4 conductor, having equal thickness on both the conductor and the dielectric below, except where it is in the proximity of the conductor. Neither Cadence nor ADS can account for this, and the two extreme interpretations are shown in figs B.4 & B.5. Figure B.2: Characteristic impedance for various microstrip configurations. Figure B.3: Microstrip (M4 over ground plane) cross section. Figure B.4: Microstrip ignoring the effect of the top dielectric close to the conductor. Figure B.5: Microstrip ignoring the effect of the top dielectric away from the conductor. 158 Appendix B ## Appendix C ## Schematics of select circuits Schematics for the three VCO circuits (Colpitt #1, Colpitt #2 and LC) have been placed in this appendix, cf. figs. C.1, C.2 & C.3. 160 Appendix C Figure C.1: Colpitt VCO circuit #1. Figure C.2: Colpitt VCO circuit #2. 162 Appendix C Figure C.3: LC VCO circuit. ### Appendix D # Attenuation table for W-band measurements Performing measurements on VCOs operating in the W-band is a time-consuming process. The attenuation in the probe, 1.0mm cable and mixer are all frequency dependent. An attenuation table is therefore required to make quick conversions to account for the losses. The table, see tab. D.1, is presented in the thesis because it facilitates the interpretation of the results from the spectrum analyzer shown in the VCO chapter. The attenuation data is based on measurements of the particular component and is supplied by the manufacturers. The table shows that the attenuation varies more than 6 dB over the W-band. 164 Appendix D | f (GHz) | Probe (dB) | Cable (dB) | Mixer (dB) | Total (dB) | |---------|------------|------------|------------|------------| | 75 | -0.881 | -2.03 | -38.4 | -41.3 | | 76 | -0.935 | -2.05 | -38.2 | -41.2 | | 77 | -0.969 | -2.07 | -38.4 | -41.4 | | 78 | -0.936 | -2.08 | -38.3 | -41.3 | | 79 | -0.934 | -2.10 | -38.0 | -41.0 | | 80 | -0.982 | -2.12 | -38.0 | -41.1 | | 81 | -1.021 | -2.13 | -38.3 | -41.5 | | 82 | -0.972 | -2.15 | -38.0 | -41.1 | | 83 | -0.978 | -2.17 | -38.2 | -41.3 | | 84 | -1.015 | -2.18 | -38.5 | -41.7 | | 85 | -1.059 | -2.20 | -38.6 | -41.9 | | 86 | -1.024 | -2.22 | -38.8 | -42.0 | | 87 | -1.034 | -2.23 | -38.8 | -42.1 | | 88 | -1.101 | -2.25 | -39.1 | -42.5 | | 89 | -1.091 | -2.27 | -39.2 | -42.6 | | 90 | -1.078 | -2.29 | -39.5 | -42.9 | | 91 | -1.100 | -2.30 | -39.6 | -43.0 | | 92 | -1.137 | -2.32 | -39.7 | -43.2 | | 93 | -1.141 | -2.34 | -39.9 | -43.4 | | 94 | -1.155 | -2.35 | -40.0 | -43.5 | | 95 | -1.156 | -2.37 | -40.1 | -43.6 | | 96 | -1.153 | -2.39 | -40.7 | -44.2 | | 97 | -1.197 | -2.40 | -40.7 | -44.3 | | 98 | -1.231 | -2.42 | -40.9 | -44.6 | | 99 | -1.166 | -2.44 | -40.9 | -44.5 | | 100 | -1.191 | -2.46 | -40.9 | -44.5 | | 101 | -1.302 | -2.47 | -41.1 | -44.9 | | 102 | -1.303 | -2.49 | -41.3 | -45.1 | | 103 | -1.211 | -2.51 | -41.5 | -45.2 | | 104 | -1.222 | -2.52 | -41.7 | -45.4 | | 105 | -1.356 | -2.54 | -41.9 | -45.8 | | 106 | -1.306 | -2.56 | -41.9 | -45.8 | | 107 | -1.225 | -2.57 | -41.8 | -45.6 | | 108 | -1.265 | -2.59 | -42.2 | -46.1 | | 109 | -1.425 | -2.61 | -42.7 | -46.7 | | 110 | -1.373 | -2.62 | -43.6 | -47.6 | Table D.1: Frequency dependent attenuation in probe, cable and harmonic waveguide mixer. ## Appendix E ### VIP-2 Neither process technology, semiconductor physics nor numerical analysis is the topic of this thesis. The interested reader will find that there are excellent references available elsewhere. However, a brief description of the Vitesse VIP-2 process is given below: - InP DHBT. - 4 layers aluminium interconnect. - 60 $\Omega$ /square thin film resistors. - $\bullet$ 0.45 fF/um MIM cap. - BV 4.5 V. 166 Appendix E Figure E.1: VIP-2 wafer cross-section.