110 research outputs found

    Clocking and Skew-Optimization For Source-Synchronous Simultaneous Bidirectional Links

    Get PDF
    There is continuous expansion of computing capabilities in mobile devices which demands higher I/O bandwidth and dense parallel links supporting higher data rates. Highspeed signaling leverages technology advancements to achieve higher data rates but is limited by the bandwidth of the electrical copper channel which have not scaled accordingly. To meet the continuous data-rate demand, Simultaneous Bi-directional (SBD) signaling technique is an attractive alternative relative to uni-directional signaling as it can work at lower clock speeds, exhibits better spectral efficiency and provides higher throughput in pad limited PCBs. For low-power and more robust system, the SBD transceiver should utilize forwarded clock system and per-pin de-skew circuits to correct the phase difference developed between the data and clock. The system can be configured in two roles, master and slave. To save more power, the system should have only one clock generator. The master has its own clock source and shares its clock to the slave through the clock channel, and the slave uses this forwarded clock to deserialize the inbound data and serialize the outbound data. A clock-to-data skew exists which can be corrected with a phase tracking CDR. This thesis presents a low-power implementation of forwarded clocking and clock-to-data skew optimization for a 40 Gbps SBD transceiver. The design is implemented in 28nm CMOS technology and consumes 8.8mW of power for 20 Gbps NRZ data at 0.9 V supply. The area occupied by the clocking 0.018 mm^2 area

    ๋ฉ”๋ชจ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์œ„ํ•œ 20Gbps๊ธ‰ ์ง๋ ฌํ™” ์†ก์ˆ˜์‹ ๊ธฐ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2013. 8. ์ •๋•๊ท .Various types of serial link for current and future memory interface are presented in this thesis. At first, PHY design for commercial GDDR3 memory is proposed. GDDR3 PHY is consists of read path, write path, command path. Write path and command path calibrate skew by using VDL (Variable delay line), while read path calibrates skew by using DLL (Delay locked loop) and VDL. There are four data channels and one command/address channel. Each data channel consists of one clock signal (DQS) and eight data signals (DQ). Data channel operates in 1.2Gbps (1.08Gbps~1.2Gbps), and command/address channel operates 600Mbps (540Mbps~600Mbps). In particular, DLL design for high speed and for SSN (simultaneous switching noise) is concentrated in this thesis. Secondly, serial link design for silicon photonics is proposed. Silicon photonics is the strongest candidate for next generation memory interface. Modulator driver for modulator, TIA (trans-impedance amplifier) and LA (limiting amplifier) for photo diode design are discussed. It operates above 12.5Gbps but it consumes much power 7.2mW/Gbps (transmitter core), 2mW/Gbps (receiver core) because it is connected with optical device which has large parasitic capacitance. Overall receiver which includes CDR (clock and data recovery) is also implemented. Many chips are fabricated in 65nm, 0.13um CMOS process. Finally, electrical serial link for 20Gbps memory link is proposed. Overall architecture is forwarded clocking architecture, and is very simple and intuitive. It does not need additional synchronizer. This open loop delay matched stream line receiver finds optimum sampling point with DCDL (Digitally controlled delay line) controller and expects to consume low power structurally. Only two phase half rate clock is transmitted through clock channel, but half rate time interleaved way sampling is performed by aid of initial value settable PRBS chaser. A CMOS Chip is fabricated by 65nm process and it occupies 2500um x 2500um (transceiver). It is expected that about 2.6mW(2.4mW)/Gbps (transmitter), 4.1mW(2.7mW)/Gbps (receiver). Power consumption improvement is expected in advanced process.ABSTRACT I CONTENTS V LIST OF FIGURES VII LIST OF TABLES XII CHAPTER 1 INTRODUCTION ๏ผ‘ 1.1 MOTIVATION ๏ผ‘ 1.2 THESIS ORGANIZATION ๏ผ‘๏ผ CHAPTER 2 A SERIAL LINK PHY DESIGN FOR GDDR3 MEMORY INTERFACE 11 2.1 INTRODUCTION 11 2.2 GDDR3 MEMORY INTERFACE ARCHITECTURE 12 2.2.1 READ PATH ARCHITECTURE 15 2.2.2 WRITE PATH ARCHITECTURE 17 2.2.3 COMMAND PATH ARCHITECTURE 19 2.3 DLL DESIGN FOR MEMORY INTERFACE 20 2.3.1 SSN(SIMULTANEOUS SWITCHING NOISE) 20 2.3.2 DLL ARCHITECTURE 21 2.3.3 VOLTAGE CONTROLLED DELAY LINE (VCDL) 22 2.3.4 HYSTERESIS COARSE LOCK DETECTOR (HCLD) 23 2.3.5 DYNAMIC PHASE DETECTOR AND CHARGE PUMP 26 2.4 SIMULATION RESULT 29 2.5 CONCLUSION 32 CHAPTER 3 OPTICAL FRONT-END SERIAL LINK DESIGN FOR 20 GBPS MEMORY INTERFACE 35 3.1 SILICON PHOTONICS INTRODUCTION 35 3.2 OPTICAL FRONT-END TRANSMITTER DESIGN 45 3.2.1 MODULATOR DRIVER REQUIREMENTS 46 3.2.2 MODULATOR DRIVER DESIGN - CURRENT MODE DRIVER 47 3.2.3 MODULATOR DRIVER DESIGN - CURRENT MODE DRIVER 50 3.3 OPTICAL FRONT-END RECEIVER DESIGN 55 3.3.1 OPTICAL RECEIVER BACK END REQUIREMENTS 56 3.3.2 OPTICAL RECEIVER BACK END DESIGN โ€“ TIA 57 3.3.3 OPTICAL RECEIVER BACK END DESIGN โ€“ LA, DRIVER 63 3.3.4 OPTICAL RECEIVER BACK END DESIGN โ€“ CDR 66 3.4 MEASUREMENT AND SIMULATION RESULTS 70 3.4.1 MEASUREMENT AND SIMULATION ENVIRONMENTS 70 3.4.2 OPTICAL TX FRONT END MEASUREMENT AND SIMULATION 74 3.4.3 OPTICAL RX FRONT END MEASUREMENT AND SIMULATION 77 3.4.4 OPTICAL RX BACK END SIMULATION 79 3.4.5 OPTICAL-ELECTRICAL OVERALL MEASUREMENTS 80 3.4.6 DIE PHOTO AND LAYOUT 82 3.5 CONCLUSION 86 CHAPTER 4 ELECTRICAL FRONT-END SERIAL LINK DESIGN FOR 20GBPS MEMORY INTERFACE 87 4.1 INTRODUCTION 87 4.2 CONVENTIONAL ELECTRICAL FRONT-END HIGH SPEED SERIAL LINK ARCHITECTURES 90 4.3 DESIGN CONCEPT AND PROPOSED SERIAL LINK ARCHITECTURE โ€“ OPEN LOOP DELAY MATCHED STREAM LINED RECEIVER. 95 4.3.1 PROPOSED OVERALL ARCHITECTURE 95 4.3.2 DESIGN CONCEPT 97 4.3.3 PROPOSED PROTOCOL AND LOCKING PROCESS 100 4.4 OPTIMUM POINT SEARCH ALGORITHM BASED DCDL CONTROLLER DESIGN 102 4.5 DCDL (DIGITALLY CONTROLLED DELAY LINE) DESIGN 112 4.6 DFE (DECISION FEEDBACK EQUALIZER) AND OTHER BLOCKS DESIGN 115 4.7 SIMULATION RESULTS 117 4.8 POWER EXPECTATION AND CHIP LAYOUT 122 4.9 CONCLUSION 124 CHAPTER 5 CONCLUSION 126 BIBLIOGRAPHY 128Docto

    Electronic Photonic Integrated Circuits and Control Systems

    Get PDF
    Photonic systems can operate at frequencies several orders of magnitude higher than electronics, whereas electronics offers extremely high density and easily built memories. Integrated photonic-electronic systems promise to combine advantage of both, leading to advantages in accuracy, reconfigurability and energy efficiency. This work concerns of hybrid and monolithic electronic-photonic system design. First, a high resolution voltage supply to control the thermooptic photonic chip for time-bin entanglement is described, in which the electronics system controller can be scaled with more number of power channels and the ability to daisy-chain the devices. Second, a system identification technique embedded with feedback control for wavelength stabilization and control model in silicon nitride photonic integrated circuits is proposed. Using the system, the wavelength in thermooptic device can be stabilized in dynamic environment. Third, the generation of more deterministic photon sources with temporal multiplexing established using field programmable gate arrays (FPGAs) as controller photonic device is demonstrated for the first time. The result shows an enhancement to the single photon output probability without introducing additional multi-photon noise. Fourth, multiple-input and multiple-output (MIMO) control of a silicon nitride thermooptic photonic circuits incorporating Mach Zehnder interferometers (MZIs) is demonstrated for the first time using a dual proportional integral reference tracking technique. The system exhibits improved performance in term of control accuracy by reducing wavelength peak drift due to internal and external disturbances. Finally, a monolithically integrated complementary metal oxide semiconductor (CMOS) nanophotonic segmented transmitter is characterized. With segmented design, the monolithic Mach Zehnder modulator (MZM) shows a low link sensitivity and low insertion loss with driver flexibility

    Power-Proportional Optical Links

    Get PDF
    The continuous increase in data transfer rate in short-reach links, such as chip-to-chip and between servers within a data-center, demands high-speed links. As power efficiency becomes ever more important in these links, power-efficient optical links need to be designed. Power efficiency in a link can be achieved by enabling power-proportional communication over the serial link. In power-proportional links, the power dissipated by a link is proportional to the amount of data communicated. Normally, data-rate demand is not constant, and the peak data-rate is not required all the time. If a link is not adapted according to the data-rate demand, there will be a fixed power dissipation, and the power efficiency of the link will degrade during the sub-maximal link utilization. Adapting links to real-time data-rate requirements reduces power dissipation. Power proportionality is achieved by scaling the power of the serial link linearly with the link utilization, and techniques such as variable data-rate and burst-mode can be adopted for this purpose. Links whose data rate (and hence power dissipation) can be varied in response to system demands are proposed in this work. Past works have presented rapidly reconfigurable bandwidth in variable data-rate receivers, allowing lower power dissipation for lower data-rate operation. However, maintaining synchronization during reconfiguration was not possible since previous approaches have introduced changes in front-end delay when they are reconfigured. This work presents a technique that allows rapid bandwidth adjustment while maintaining a near-constant delay through the receiver suitable for a power-scalable variable data-rate optical link. Measurements of a fabricated integrated circuit (IC) show nearly constant energy per bit across a 2ร— variation in data rate while introducing less than 10 % of a unit interval (UI) of delay variation. With continuously increasing data communication in data-centers, parallel optical links with ever-increasing per-lane data rates are being used to meet overall throughput demands. Simultaneously, power efficiency is becoming increasingly important for these links since they do not transmit useful data all the time. The burst-mode solution for vertical-cavity surface-emitting laser (VCSEL)-based point-to-point communication can be used to improve linksโ€™ energy efficiency during low link activity. The burst-mode technique for VCSEL-based links has not yet been deployed commercially. Past works have presented burst-mode solutions for single-channel receivers, allowing lower power dissipation during low link activity and solutions for fast activation of the receivers. However, this work presents a novel technique that allows rapid activation of a front-end and fast locking of a clock-and-data-recovery (CDR) for a multi-channel parallel link, utilizing opportunities arising from the parallel nature of many VCSEL-based links. The idea has been demonstrated through electrical and optical measurements of a fabricated IC at 10 Gbps, which show fast data detection and activation of the circuitry within 49 UIs while allowing the front-end to achieve better energy efficiency during low link activity. Simulation results are also presented in support of the proposed technique which allows the CDR to lock within 26 UIs from when it is powered on

    ๋ฐ์ดํ„ฐ ์ „์†ก๋กœ ํ™•์žฅ์„ฑ๊ณผ ๋ฃจํ”„ ์„ ํ˜•์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚จ ๋‹ค์ค‘์ฑ„๋„ ์ˆ˜์‹ ๊ธฐ๋“ค์— ๊ด€ํ•œ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2013. 2. ์ •๋•๊ท .Two types of serial data communication receivers that adopt a multichannel architecture for a high aggregate I/O bandwidth are presented. Two techniques for collaboration and sharing among channels are proposed to enhance the loop-linearity and channel-expandability of multichannel receivers, respectively. The first proposed receiver employs a collaborative timing scheme recovery which relies on the sharing of all outputs of phase detectors (PDs) among channels to extract common information about the timing and multilevel signaling architecture of PAM-4. The shared timing information is processed by a common global loop filter and is used to update the phase of the voltage-controlled oscillator with better rejection of per-channel noise. In addition to collaborative timing recovery, a simple linearization technique for binary PDs is proposed. The technique realizes a high-rate oversampling PD while the hardware cost is equivalent to that of a conventional 2x-oversampling clock and data recovery. The first receiver exploiting the collaborative timing recovery architecture is designed using 45-nm CMOS technology. A single data lane occupies a 0.195-mm2 area and consumes a relatively low 17.9 mW at 6 Gb/s at 1.0V. Therefore, the power efficiency is 2.98 mW/Gb/s. The simulated jitter is about 0.034 UI RMS given an input jitter value of 0.03 UI RMS, while the relatively constant loop bandwidth with the PD linearization technique is about 7.3-MHz regardless of the data-stream noise. Unlike the first receiver, the second proposed multichannel receiver was designed to reduce the hardware complexity of each lane. The receiver employs shared calibration logic among channels and yet achieves superior channel expandability with slim data lanes. A shared global calibration control, which is used in a forwarded clock receiver based on a multiphase delay-locked loop, accomplishes skew calibration, equalizer adaptation, and the phase lock of all channels during a calibration period, resulting in reduced hardware overhead and less area required by each data lane. The second forwarded clock receiver is designed in 90-nm CMOS technology. It achieves error-free eye openings of more than 0.5 UI across 9โˆ’ 28 inch Nelco 4000-6 microstrips at 4โˆ’ 7 Gb/s and more than 0.42 UI at data rates of up to 9 Gb/s. The data lane occupies only 0.152 mm2 and consumes 69.8 mW, while the rest of the receiver occupies 0.297 mm2 and consumes 56 mW at a data rate of 7 Gb/s and a supply voltage of 1.35 V.1. Introduction 1 1.1 Motivations 1.2 Thesis Organization 2. Previous Receivers for Serial-Data Communications 2.1 Classification of the Links 2.2 Clocking architecture of transceivers 2.3 Components of receiver 2.3.1 Channel loss 2.3.2 Equalizer 2.3.3 Clock and data recovery circuit 2.3.3.1. Basic architecture 2.3.3.2. Phase detector 2.3.3.2.1. Linear phase detector 2.3.3.2.2. Binary phase detector 2.3.3.3. Frequency detector 2.3.3.4. Charge pump 2.3.3.5. Voltage controlled oscillator and delay-line 2.3.4 Loop dynamics of PLL 2.3.5 Loop dynamics of DLL 3. The Proposed PLL-Based Receiver with Loop Linearization Technique 3.1 Introduction 3.2 Motivation 3.3 Overview of binary phase detection 3.4 The proposed BBPD linearization technique 3.4.1 Architecture of the proposed PLL-based receiver 3.4.2 Linearization technique of binary phase detection 3.4.3 Rotational pattern of sampling phase offset 3.5 PD gain analysis and optimization 3.6 Loop Dynamics of the 2nd-order CDR 3.7 Verification with the time-accurate behavioral simulation 3.8 Summary 4. The Proposed DLL-Based Receiver with Forwarded-Clock 4.1 Introduction 4.2 Motivation 4.3 Design consideration 4.4 Architecture of the proposed forwarded-clock receiver 4.5 Circuit description 4.5.1 Analog multi-phase DLL 4.5.2 Dual-input interpolating deley cells 4.5.3 Dedicated half-rate data samplers 4.5.4 Cherry-Hooper continuous-time linear equalizer 4.5.5 Equalizer adaptation and phase-lock scheme 4.6 Measurement results 5. Conclusion 6. BibliographyDocto

    High-speed equalization and transmission in electrical interconnections

    Get PDF
    The relentless growth of data traffic and increasing digital signal processing capabilities of integrated circuits (IC) are demanding ever faster chip-to-chip / chip-to-module serial electrical interconnects. As data rates increase, the signal quality after transmission over printed circuit board (PCB) interconnections is severely impaired. Frequency-dependent loss and crosstalk noise lead to a reduced eye opening, a reduced signal-to-noise ratio and an increased inter-symbol interference (ISI). This, in turn, requires the use of improved signal processing or PCB materials, in order to overcome the bandwidth (BW) limitations and to improve signal integrity. By applying an optimal combination of equalizer and receiver electronics together with BW-efficient modulation schemes, the transmission rate over serial electrical interconnections can be pushed further. At the start of this research, most industrial backplane connectors, meeting the IEEE and OIF specifications such as manufactured by e.g. FCI or TE connectivity, had operational capabilities of up to 25 Gb/s. This research was mainly performed under the IWT ShortTrack project. The goal of this research was to increase the transmission speed over electrical backplanes up to 100 Gb/s per channel for next-generation telecom systems and data centers. This requirement greatly surpassed the state-ofthe-art reported in previous publications, considering e.g. 25 Gb/s duobinary and 42.8 Gb/s PAM-4 transmission over a low-loss Megtron 6 electrical backplane using off-line processing. The successful implementation of the integrated transmitter (TX) and receiver (RX) (1) , clearly shows the feasibility of single lane interconnections beyond 80 Gb/s and opens the potential of realizing industrial 100 Gb/s links using a recent IC technology process. Besides the advancement of the state-of-the-art in the field of high-speed transceivers and backplane transmission systems, which led to several academic publications, the output of this work also attracts a lot of attention from the industry, showing the potential to commercialize the developed chipset and technologies used in this research for various applications: not only in high-speed electrical transmission links, but also in high-speed opto-electronic communications such as access, active optical cables and optical backplanes. In this dissertation, the background of this research, an overview of this work and the thesis organization are illustrated in Chapter 1. In Chapter 2, a system level analysis is presented, showing that the channel losses are limiting the transmission speed over backplanes. In order to enhance the serial data rate over backplanes and to eliminate the signal degradation, several technologies are discussed, such as signal equalization and modulation techniques. First, a prototype backplane channel, from project partner FCI, implemented with improved backplane connectors is characterized. Second, an integrated transversal filter as a feed-forward equalizer (FFE) is selected to perform the signal equalization, based on a comprehensive consideration of the backplane channel performance, equalization capabilities, implementation complexity and overall power consumption. NRZ, duobinary and PAM-4 are the three most common modulation schemes for ultra-high speed electrical backplane communication. After a system-level simulation and comparison, the duobinary format is selected due to its high BW efficiency and reasonable circuit complexity. Last, different IC technology processes are compared and the ST microelectronics BiCMOS9MW process (featuring a fT value of over 200 GHz) is selected, based on a trade-off between speed and chip cost. Meanwhile it also has a benefit for providing an integrated microstrip model, which is utilized for the delay elements of the FFE. Chapter 3 illustrates the chip design of the high-speed backplane TX, consisting of a multiplexer (MUX) and a 5-tap FFE. The 4:1 MUX combines four lower rate streams into a high-speed differential NRZ signal up to 100 Gb/s as the FFE input. The 5-tap FFE is implemented with a novel topology for improved testability, such that the FFE performance can be individually characterized, in both frequency- and time-domain, which also helps to perform the coefficient optimization of the FFE. Different configurations for the gain cell in the FFE are compared. The gilbert configuration shows most advantages, in both a good high-frequency performance and an easy way to implement positive / negative amplification. The total chip, including the MUX and the FFE, consumes 750mW from a 2.5V supply and occupies an area of 4.4mm ร— 1.4 mm. In Chapter 4, the TX chip is demonstrated up to 84 Gb/s. First, the FFE performance is characterized in the frequency domain, showing that the FFE is able to work up to 84 Gb/s using duobinary formats. Second, the combination of the MUX and the FFE is tested. The equalized TX outputs are captured after different channels, for both NRZ and duobinary signaling at speeds from 64 Gb/s to 84 Gb/s. Then, by applying the duobinary RX 2, a serial electrical transmission link is demonstrated across a pair of 10 cm coax cables and across a 5 cm FX-2 differential stripline. The 5-tap FFE compensates a total loss between the TX and the RX chips of about 13.5 dB at the Nyquist frequency, while the RX receives the equalized signal and decodes the duobinary signal to 4 quarter rate NRZ streams. This shows a chip-to-chip data link with a bit error rate (BER) lower than 10โˆ’11. Last, the electrical data transmission between the TX and the RX over two commercial backplanes is demonstrated. An error-free, serial duobinary transmission across a commercial Megtron 6, 11.5 inch backplane is demonstrated at 48 Gb/s, which indicates that duobinary outperforms NRZ for attaining higher speed or longer reach backplane applications. Later on, using an ExaMAXยฎ backplane demonstrator, duobinary transmission performance is verified and the maximum allowed channel loss at 40 Gb/s transmission is explored. The eye diagram and BER measurements over a backplane channel up to 26.25 inch are performed. The results show that at 40 Gb/s, a total channel loss up to 37 dB at the Nyquist frequency allows for error-free duobinary transmission, while a total channel loss of 42 dB was overcome with a BER below 10โˆ’8. An overview of the conclusions is summarized in Chapter 5, along with some suggestions for further research in this field. (1) The duobinary receiver was developed by my colleague Timothy De Keulenaer, as described in his PhD dissertation. (2) Described in the PhD dissertation of Timothy De Keulenaer

    ๋Œ€์—ญํญ ์ฆ๋Œ€ ๊ธฐ์ˆ ์„ ์ด์šฉํ•œ ์ „๋ ฅ ํšจ์œจ์  ๊ณ ์† ์†ก์‹  ์‹œ์Šคํ…œ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022.2. ์ •๋•๊ท .The high-speed interconnect at the datacenter is being more crucial as 400 Gb Ethernet standards are developed. At the high data rate, channel loss re-quires bandwidth extension techniques for transmitters, even for short-reach channels. On the other hand, as the importance of east-to-west connection is rising, the data center architectures are switching to spine-leaf from traditional ones. In this trend, the number of short-reach optical interconnect is expected to be dominant. The vertical-cavity surface-emitting laser (VCSEL) is a com-monly used optical modulator for short-reach interconnect. However, since VCSEL has low bandwidth and nonlinearity, the optical transmitter also needs bandwidth-increasing techniques. Additionally, the power consumption of data centers reaches a point of concern to affect climate change. Therefore, this the-sis focuses on high-speed, power-efficient transmitters for data center applica-tions. Before the presenting circuit design, bandwidth extension techniques such as fractionally-spaced feed-forward equalizer (FFE), on-chip transmission line, inductive peaking, and T-coil are mathematically analyzed for their effec-tiveness. For the first chip, a power and area-efficient pulse-amplitude modulation 4 (PAM-4) transmitter using 3-tap FFE based on a slow-wave transmission line is presented. A passive delay line is adopted for generating an equalizer tap to overcome the high clocking power consumption. The transmission line achieves a high slow-wave factor of 15 with double floating metal shields around the differential coplanar waveguide. The transmitter includes 4:1 multi-plexers (MUXs) and a quadrature clock generator for high-speed data genera-tion in a quarter-rate system. The 4:1 MUX utilizes a 2-UI pulse generator, and the input configuration is determined by qualitative analysis. The chip is fabri-cated in 65 nm CMOS technology and occupies an area of 0.151 mm2. The proposed transmitter system exhibits an energy efficiency of 3.03 pJ/b at the data rate of 48 Gb/s with PAM-4 signaling. The second chip presents a power-efficient PAM-4 VCSEL transmitter using 3-tap FFE and negative-k T-coil. The phase interpolators (PIs) generate frac-tionally-spaced FFE tap and correct quadrature phase error. The PAM-4 com-bining 8:1 MUX is proposed rather than combining at output driver with double 4:1 MUXs to reduce serializing power consumption. T-coils at the internal and output node increase the bandwidth and remove inter-symbol interference (ISI). The negative-k T-coil at the output network increases the bandwidth 1.61 times than without T-coil. The VCSEL driver is placed on the high VSS domain for anode driving and power reduction. The chip is fabricated in 40 nm CMOS technology. The proposed VCSEL transmitter operates up to 48 Gb/s NRZ and 64 Gb/s PAM-4 with the power efficiency of 3.03 pJ/b and 2.09 pJ/b, respec-tively.400Gb ์ด๋”๋„ท ํ‘œ์ค€์ด ๊ฐœ๋ฐœ๋จ์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ ์„ผํ„ฐ์˜ ๊ณ ์† ์ƒํ˜ธ ์—ฐ๊ฒฐ์ด ๋”์šฑ ์ค‘์š”ํ•ด์ง€๊ณ  ์žˆ๋‹ค. ๋†’์€ ๋ฐ์ดํ„ฐ ์†๋„์—์„œ์˜ ์ฑ„๋„ ์†์‹ค์— ์˜ํ•ด ๋‹จ๊ฑฐ๋ฆฌ ์ฑ„๋„์˜ ๊ฒฝ์šฐ์—๋„ ์†ก์‹ ๊ธฐ์— ๋Œ€ํ•œ ๋Œ€์—ญํญ ํ™•์žฅ ๊ธฐ์ˆ ์ด ํ•„์š”ํ•˜๋‹ค. ํ•œํŽธ, ๋ฐ์ดํ„ฐ ์„ผํ„ฐ ๋‚ด ๋™-์„œ ์—ฐ๊ฒฐ์˜ ์ค‘์š”์„ฑ์ด ๋†’์•„์ง€๋ฉด์„œ ๋ฐ์ดํ„ฐ ์„ผํ„ฐ ์•„ํ‚คํ…์ฒ˜๊ฐ€ ๊ธฐ์กด์˜ ์•„ํ‚คํ…์ฒ˜์—์„œ ์ŠคํŒŒ์ธ-๋ฆฌํ”„๋กœ ์ „ํ™˜๋˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ถ”์„ธ์—์„œ ๋‹จ๊ฑฐ๋ฆฌ ๊ด‘ํ•™ ์ธํ„ฐ์ปค๋„ฅํŠธ์˜ ์ˆ˜๊ฐ€ ์ ์ฐจ ์šฐ์„ธํ•ด์งˆ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋œ๋‹ค. ์ˆ˜์ง ์บ๋น„ํ‹ฐ ํ‘œ๋ฉด ๋ฐฉ์ถœ ๋ ˆ์ด์ €(VCSEL)๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋‹จ๊ฑฐ๋ฆฌ ์ƒํ˜ธ ์—ฐ๊ฒฐ์„ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ๊ด‘ํ•™ ๋ชจ๋“ˆ๋ ˆ์ดํ„ฐ์ด๋‹ค. VCSEL์€ ๋‚ฎ์€ ๋Œ€์—ญํญ๊ณผ ๋น„์„ ํ˜•์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๊ด‘ ์†ก์‹ ๊ธฐ๋„ ๋Œ€์—ญํญ ์ฆ๊ฐ€ ๊ธฐ์ˆ ์„ ํ•„์š”๋กœ ํ•œ๋‹ค. ๋˜ํ•œ, ๋ฐ์ดํ„ฐ ์„ผํ„ฐ์˜ ์ „๋ ฅ ์†Œ๋น„๋Š” ๊ธฐํ›„ ๋ณ€ํ™”์— ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ๋Š” ์šฐ๋ ค ์ง€์ ์— ๋„๋‹ฌํ–ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ๋ณธ ๋…ผ๋ฌธ์€ ๋ฐ์ดํ„ฐ ์„ผํ„ฐ ์‘์šฉ์„ ์œ„ํ•œ ๊ณ ์† ์ „๋ ฅ ํšจ์œจ์ ์ธ ์†ก์‹ ๊ธฐ์— ์ดˆ์ ์„ ๋งž์ถ”๊ณ  ์žˆ๋‹ค. ํšŒ๋กœ ์„ค๊ณ„๋ฅผ ์ œ์‹œํ•˜๊ธฐ ์ „์—, ๋ถ€๋ถ„ ๊ฐ„๊ฒฉ ํ”ผ๋“œ-ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ € (FFE), ์˜จ์นฉ ์ „์†ก์„ ๋กœ, ์ธ๋•ํ„ฐ, T-์ฝ”์ผ๊ณผ ๊ฐ™์€ ๋Œ€์—ญํญ ํ™•์žฅ ๊ธฐ์ˆ ์„ ์ˆ˜ํ•™์ ์œผ๋กœ ๋ถ„์„ํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์นฉ์€ ์ €์†ํŒŒ ์ „์†ก์„ ๋กœ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ 3-ํƒญ FFE๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ „๋ ฅ ๋ฐ ๋ฉด์  ํšจ์œจ์ ์ธ ํŽ„์Šค-์ง„ํญ-๋ณ€์กฐ 4(PAM-4) ์†ก์‹ ๊ธฐ๋ฅผ ์ œ์‹œํ•œ๋‹ค. ๋†’์€ ํด๋Ÿญ ์ „๋ ฅ ์†Œ๋น„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ์ดํ€„๋ผ์ด์ € ํƒญ ์ƒ์„ฑ์„ ์œ„ํ•ด ์ˆ˜๋™์†Œ์ž ์ง€์—ฐ ๋ผ์ธ์„ ์ฑ„ํƒํ–ˆ๋‹ค. ์ „์†ก ๋ผ์ธ์€ ์ฐจ๋™ ๋™์ผํ‰๋ฉด๋„ํŒŒ๊ด€ ์ฃผ์œ„์— ์ด์ค‘ ํ”Œ๋กœํŒ… ๊ธˆ์† ์ฐจํ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 15์˜ ๋†’์€ ์ „๋‹ฌ์†๋„ ๊ฐ์‡ ๋ฅผ ๋‹ฌ์„ฑํ•œ๋‹ค. ์†ก์‹ ๊ธฐ์—๋Š” 4:1 ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ(MUX)์™€ 4-์œ„์ƒ ํด๋Ÿญ ์ƒ์„ฑ๊ธฐ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋‹ค. 4:1 MUX๋Š” 2-UI ํŽ„์Šค ๋ฐœ์ƒ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, ์ •์„ฑ ๋ถ„์„์— ์˜ํ•ด ์ž…๋ ฅ ๊ตฌ์„ฑ์ด ๊ฒฐ์ •๋œ๋‹ค. ์ด ์นฉ์€ 65 nm CMOS ๊ธฐ์ˆ ๋กœ ์ œ์ž‘๋˜์—ˆ์œผ๋ฉฐ 0.151 mm2์˜ ๋ฉด์ ์„ ์ฐจ์ง€ํ•œ๋‹ค. ์ œ์•ˆ๋œ ์†ก์‹ ๊ธฐ ์‹œ์Šคํ…œ์€ PAM-4 ์‹ ํ˜ธ์™€ ํ•จ๊ป˜ 48 Gb/s์˜ ๋ฐ์ดํ„ฐ ์†๋„์—์„œ 3.03 pJ/b์˜ ์—๋„ˆ์ง€ ํšจ์œจ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‘ ๋ฒˆ์งธ ์นฉ์—์„œ๋Š” 3-ํƒญ FFE ๋ฐ ์—ญํšŒ์ „ T-์ฝ”์ผ์„ ์‚ฌ์šฉํ•˜๋Š” ์ „๋ ฅ ํšจ์œจ์ ์ธ PAM-4 VCSEL ์†ก์‹ ๊ธฐ๋ฅผ ์ œ์‹œํ•œ๋‹ค. ์œ„์ƒ ๋ณด๊ฐ„๊ธฐ(PI)๋Š” ๋ถ€๋ถ„ ๊ฐ„๊ฒฉ FFE ํƒญ์„ ์ƒ์„ฑํ•˜๊ณ  4-์œ„์ƒ ํด๋Ÿญ ์˜ค๋ฅ˜๋ฅผ ์ˆ˜์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค. ์ง๋ ฌํ™” ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ถœ๋ ฅ ๋“œ๋ผ์ด๋ฒ„์—์„œ MSB์™€ LSB๋ฅผ ๋‘ ๊ฐœ์˜ 4:1 MUX๋ฅผ ํ†ตํ•ด ๊ฒฐํ•ฉํ•˜๋Š” ๋Œ€์‹  8:1 MUX๋ฅผ ํ†ตํ•ด PAM-4๋กœ ๊ฒฐํ•ฉํ•˜๋Š” ํšŒ๋กœ๊ฐ€ ์ œ์•ˆ๋œ๋‹ค. ๋‚ด๋ถ€ ๋ฐ ์ถœ๋ ฅ ๋…ธ๋“œ์—์„œ T-์ฝ”์ผ์€ ๋Œ€์—ญํญ์„ ์ฆ๊ฐ€์‹œํ‚ค๊ณ  ๊ธฐํ˜ธ ๊ฐ„ ๊ฐ„์„ญ(ISI)์„ ์ œ๊ฑฐํ•œ๋‹ค. ์ถœ๋ ฅ ๋„คํŠธ์›Œํฌ์—์„œ ์—ญํšŒ์ „ T-์ฝ”์ผ์€ T-์ฝ”์ผ์ด ์—†๋Š” ๊ฒฝ์šฐ๋ณด๋‹ค ๋Œ€์—ญํญ์„ 1.61๋ฐฐ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. VCSEL ๋“œ๋ผ์ด๋ฒ„๋Š” ์–‘๊ทน ๊ตฌ๋™ ๋ฐ ์ „๋ ฅ ๊ฐ์†Œ๋ฅผ ์œ„ํ•ด ๋†’์€ VSS ๋„๋ฉ”์ธ์— ๋ฐฐ์น˜๋œ๋‹ค. ์ด ์นฉ์€ 40 nm CMOS ๊ธฐ์ˆ ๋กœ ์ œ์ž‘๋˜์—ˆ๋‹ค. ์ œ์•ˆ๋œ VCSEL ์†ก์‹ ๊ธฐ๋Š” ๊ฐ๊ฐ 3.03pJ/b์™€ 2.09pJ/b์˜ ์ „๋ ฅ ํšจ์œจ๋กœ ์ตœ๋Œ€ 48Gb/s NRZ์™€ 64Gb/s PAM-4๊นŒ์ง€ ์ž‘๋™ํ•œ๋‹ค.ABSTRACT I CONTENTS III LIST OF FIGURES V LIST OF TABLES IX CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 5 CHAPTER 2 BACKGROUND OF HIGH-SPEED INTERFACE 6 2.1 OVERVIEW 6 2.2 BASIS OF DATA CENTER ARCHITECTURE 9 2.3 SHORT-REACH INTERFACE STANDARDS 12 2.4 ANALYSES OF BANDWIDTH EXTENSION TECHNIQUES 16 2.4.1 FRACTIONALLY-SPACED FFE 16 2.4.2 TRANSMISSION LINE 21 2.4.3 INDUCTOR 24 2.4.4 T-COIL 33 CHAPTER 3 DESIGN OF 48 GB/S PAM-4 ELECTRICAL TRANSMITTER IN 65 NM CMOS 43 3.1 OVERVIEW 43 3.2 FFE BASED ON DOUBLE-SHIELDED COPLANAR WAVEGUIDE 46 3.2.1 BASIC CONCEPT 46 3.2.2 PROPOSED DOUBLE-SHIELDED COPLANAR WAVEGUIDE 47 3.3 DESIGN CONSIDERATION ON 4:1 MUX 50 3.4 PROPOSED PAM-4 ELECTRICAL TRANSMITTER 53 3.5 MEASUREMENT 57 CHAPTER 4 DESIGN OF 64 GB/S PAM-4 OPTICAL TRANSMITTER IN 40 NM CMOS 64 4.1 OVERVIEW 64 4.2 DESIGN CONSIDERATION OF OPTICAL TRANSMITTER 66 4.3 PROPOSED PAM-4 VCSEL TRANSMITTER 69 4.4 MEASUREMENT 82 CHAPTER 5 CONCLUSIONS 88 BIBLIOGRAPHY 90 ์ดˆ ๋ก 101๋ฐ•

    Toward realizing power scalable and energy proportional high-speed wireline links

    Get PDF
    Growing computational demand and proliferation of cloud computing has placed high-speed serial links at the center stage. Due to saturating energy efficiency improvements over the last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as output drivers, receiver, or clock generation and distribution. However, this approach yields very limited efficiency improvement. This dissertation takes an alternative approach toward reducing the serial link power. Instead of optimizing the power of individual building blocks, power of the entire serial link is reduced by exploiting serial link usage by the applications. It has been demonstrated that serial links in servers are underutilized. On average, they are used only 15% of the time, i.e. these links are idle for approximately 85% of the time. Conventional links consume power during idle periods to maintain synchronization between the transmitter and the receiver. However, by powering-off the link when idle and powering it back when needed, power consumption of the serial link can be scaled proportionally to its utilization. This approach of rapid power state transitioning is known as the rapid-on/off approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power, and power state transition energy must all be close to zero. However, in practice, it is very difficult to achieve these ideal conditions. Work presented in this dissertation addresses these challenges. When this research work was started (2011-12), there were only a couple of research papers available in the area of rapid-on/off links. Systematic study or design of a rapid power state transitioning in serial links was not available in the literature. Since rapid-on/off with nanoseconds granularity is not a standard in any wireline communication, even the popular test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However, these challenges provided a unique opportunity to explore new architectural techniques and identify trade-offs. The key contributions of this dissertation are as follows. The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to find alternative ways to reduce the serial link power. The second contribution is to identify potential power saving techniques and evaluate the challenges they pose and the opportunities they present. The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature. The transmitter achieves rapid-on/off capability in voltage mode output driver by using a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and periodic reference insertion. To ease timing requirements, an improved edge replacement logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time as a function of various circuit parameters is also discussed. The proposed transmitter demonstrates energy proportional operation over wide variations of link utilization, and is, therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns, respectively. This dissertation highlights key trade-off in the clock multiplier architecture, to achieve fast power-on-lock capability at the cost of jitter performance. The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi- plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita- tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves power-on-lock in 1ns. The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit- ter and receiver. It was the first reported design of a complete transceiver, with an embedded clock architecture, having rapid-on/off capability. Background phase calibration technique in PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on. The proposed transceiver demonstrates power scalability with a wide range of link utiliza- tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes by 100x (7Gb/s-to-70Mb/s). The sixth and final contribution is the design of a temperature sensor to compensate the frequency drifts due to temperature variations, during long power-off periods, in the fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor is designed with all digital logic gates and achieves low supply sensitivity. This sensor is suitable for integration in processor and DRAM environments. The proposed sensor works on the principle of directly converting temperature information to frequency and finally to digital bits. A novel sensing technique is proposed in which temperature information is acquired by creating a threshold voltage difference between the transistors used in the oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and the overhead of voltage regulators and an external ideal reference frequency is avoided. The effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of ยฑ0.9oC and ยฑ2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 , and measurement (conversion) time of 6.5ฮผs

    Integrated Circuit Design for Hybrid Optoelectronic Interconnects

    Get PDF
    This dissertation focuses on high-speed circuit design for the integration of hybrid optoelectronic interconnects. It bridges the gap between electronic circuit design and optical device design by seamlessly incorporating the compact Verilog-A model for optical components into the SPICE-like simulation environment, such as the Cadence design tool. Optical components fabricated in the IME 130nm SOI CMOS process are characterized. Corresponding compact Verilog-A models for Mach-Zehnder modulator (MZM) device are developed. With this approach, electro-optical co-design and hybrid simulation are made possible. The developed optical models are used for analyzing the system-level specifications of an MZM based optoelectronic transceiver link. Link power budgets for NRZ, PAM-4 and PAM-8 signaling modulations are simulated at system-level. The optimal transmitter extinction ratio (ER) is derived based on the required receiver\u27s minimum optical modulation amplitude (OMA). A limiting receiver is fabricated in the IBM 130 nm CMOS process. By side- by-side wire-bonding to a commercial high-speed InGaAs/InP PIN photodiode, we demonstrate that the hybrid optoelectronic limiting receiver can achieve the bit error rate (BER) of 10-12 with a -6.7 dBm sensitivity at 4 Gb/s. A full-rate, 4-channel 29-1 length parallel PRBS is fabricated in the IBM 130 nm SiGe BiCMOS process. Together with a 10 GHz phase locked loop (PLL) designed from system architecture to transistor level design, the PRBS is demonstrated operating at more than 10 Gb/s. Lessons learned from high-speed PCB design, dealing with signal integrity issue regarding to the PCB transmission line are summarized

    A 40-Gb/s Quarter-Rate SerDes Transmitter and Receiver Chipset in 65-nm CMOS

    Get PDF
    This paper presents a 40-Gb/s transmitter (TX) and receiver (RX) chipset for chip-to-chip communications in a 65-nm CMOS process. The TX implements a quarter-rate multi-multiplexer (MUX)-based four-tap feed-forward equalizer (FFE), where a charge-sharing-effect elimination technique is introduced into the 4:1 MUX to optimize its jitter performance and power efficiency. The RX employs a two-stage continuous-time linear equalizer as the analog front end and integrates a low-cost sign-based zero-forcing engine relying on edge-data correlation to automatically adjust the tap weights of the TX-FFE. By embedding low-pass filters with an adaptively adjusting bandwidth into the data-sampling path and adopting high-linearity compensating phase interpolators, the clock data recovery achieves both high jitter tolerance and low jitter generation. The fabricated TX and RX chipset delivers 40-Gb/s PRBS data at BER 16-dB loss at half-baud frequency, while consuming a total power of 370 mW
    • โ€ฆ
    corecore