5 research outputs found
μ°¨μΈλ HBM μ© κ³ μ§μ , μ μ λ ₯ μ‘μμ κΈ° μ€κ³
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ 보곡νλΆ, 2020. 8. μ λκ· .This thesis presents design techniques for high-density power-efficient transceiver for the next-generation high bandwidth memory (HBM). Unlike the other memory interfaces, HBM uses a 3D-stacked package using through-silicon via (TSV) and a silicon interposer. The transceiver for HBM should be able to solve the problems caused by the 3D-stacked package and TSV.
At first, a data (DQ) receiver for HBM with a self-tracking loop that tracks a phase skew between DQ and data strobe (DQS) due to a voltage or thermal drift is proposed. The self-tracking loop achieves low power and small area by uti-lizing an analog-assisted baud-rate phase detector. The proposed pulse-to-charge (PC) phase detector (PD) converts the phase skew to a voltage differ-ence and detects the phase skew from the voltage difference. An offset calibra-tion scheme that can compensates for a mismatch of the PD is also proposed. The proposed calibration scheme operates without any additional sensing cir-cuits by taking advantage of the write training of HBM. Fabricated in 65 nm CMOS, the DQ receiver shows a power efficiency of 370 fJ/b at 4.8 Gb/s and occupies 0.0056 mm2. The experimental results show that the DQ receiver op-erates without any performance degradation under a Β± 10% supply variation.
In a second prototype IC, a high-density transceiver for HBM with a feed-forward-equalizer (FFE)-combined crosstalk (XT) cancellation scheme is pre-sented. To compensate for the XT, the transmitter pre-distorts the amplitude of the FFE output according to the XT. Since the proposed XT cancellation (XTC) scheme reuses the FFE implemented to equalize the channel loss, additional circuits for the XTC is minimized. Thanks to the XTC scheme, a channel pitch can be significantly reduced, allowing for the high channel density. Moreover, the 3D-staggered channel structure removes the ground layer between the verti-cally adjacent channels, which further reduces a cross-sectional area of the channel per lane. The test chip including 6 data lanes is fabricated in 65 nm CMOS technology. The 6-mm channels are implemented on chip to emulate the silicon interposer between the HBM and the processor. The operation of the XTC scheme is verified by simultaneously transmitting 4-Gb/s data to the 6 consecutive channels with 0.5-um pitch and the XTC scheme reduces the XT-induced jitter up to 78 %. The measurement result shows that the transceiver achieves the throughput of 8 Gb/s/um. The transceiver occupies 0.05 mm2 for 6 lanes and consumes 36.6 mW at 6 x 4 Gb/s.λ³Έ λ
Όλ¬Έμμλ μ°¨μΈλ HBMμ μν κ³ μ§μ μ μ λ ₯ μ‘μμ κΈ° μ€κ³ λ°©λ²μ μ μνλ€. 첫 λ²μ§Έλ‘, μ μ λ° μ¨λ λ³νμ μν λ°μ΄ν°μ ν΄λ κ° μμ μ°¨μ΄λ₯Ό 보μν μ μλ μ체 μΆμ 루νλ₯Ό κ°μ§ λ°μ΄ν° μμ κΈ°λ₯Ό μ μνλ€. μ μνλ μ체 μΆμ 루νλ λ°μ΄ν° μ μ‘ μλμ κ°μ μλλ‘ λμνλ μμ κ²μΆκΈ°λ₯Ό μ¬μ©νμ¬ μ λ ₯ μλͺ¨μ λ©΄μ μ μ€μλ€. λν λ©λͺ¨λ¦¬μ μ°κΈ° νλ ¨ (write training) κ³Όμ μ μ΄μ©νμ¬ ν¨κ³Όμ μΌλ‘ μμ κ²μΆκΈ°μ μ€νμ
μ 보μν μ μλ λ°©λ²μ μ μνλ€. μ μνλ λ°μ΄ν° μμ κΈ°λ 65 nm 곡μ μΌλ‘ μ μλμ΄ 4.8 Gb/sμμ 370 fJ/bμ μλͺ¨νμλ€. λν 10 % μ μ μ λ³νμ λνμ¬ μμ μ μΌλ‘ λμνλ κ²μ νμΈνμλ€.
λ λ²μ§Έλ‘, νΌλ ν¬μλ μ΄νλΌμ΄μ μ κ²°ν©λ ν¬λ‘μ€ ν ν¬ λ³΄μ λ°©μμ νμ©ν κ³ μ§μ μ‘μμ κΈ°λ₯Ό μ μνλ€. μ μνλ μ‘μ κΈ°λ ν¬λ‘μ€ ν ν¬ ν¬κΈ°μ ν΄λΉνλ λ§νΌ μ‘μ κΈ° μΆλ ₯μ μ곑νμ¬ ν¬λ‘μ€ ν ν¬λ₯Ό 보μνλ€. μ μνλ ν¬λ‘μ€ ν ν¬ λ³΄μ λ°©μμ μ±λ μμ€μ 보μνκΈ° μν΄ κ΅¬νλ νΌλ ν¬μλ μ΄νλΌμ΄μ λ₯Ό μ¬νμ©ν¨μΌλ‘μ¨ μΆκ°μ μΈ νλ‘λ₯Ό μ΅μννλ€. μ μνλ μ‘μμ κΈ°λ ν¬λ‘μ€ ν ν¬κ° 보μ κ°λ₯νκΈ° λλ¬Έμ, μ±λ κ°κ²©μ ν¬κ² μ€μ¬ κ³ μ§μ ν΅μ μ ꡬννμλ€. λν μ§μ λλ₯Ό λ μ¦κ°μν€κΈ° μν΄ μΈλ‘λ‘ μΈμ ν μ±λ μ¬μ΄μ μ°¨ν μΈ΅μ μ κ±°ν μ μΈ΅ μ±λ ꡬ쑰λ₯Ό μ μνλ€. 6κ°μ μ‘μμ κΈ°λ₯Ό ν¬ν¨ν νλ‘ν νμ
μΉ©μ 65 nm 곡μ μΌλ‘ μ μλμλ€. HBMκ³Ό νλ‘μΈμ μ¬μ΄μ silicon interposer channel μ λͺ¨μ¬νκΈ° μν 6 mm μ μ±λμ΄ μΉ© μμ ꡬνλμλ€. μ μνλ ν¬λ‘μ€ ν ν¬ λ³΄μ λ°©μμ 0.5 um κ°κ²©μ 6κ°μ μΈμ ν μ±λμ λμμ λ°μ΄ν°λ₯Ό μ μ‘νμ¬ κ²μ¦λμμΌλ©°, ν¬λ‘μ€ ν ν¬λ‘ μΈν μ§ν°λ₯Ό μ΅λ 78 % κ°μμμΌ°λ€. μ μνλ μ‘μμ κΈ°λ 8 Gb/s/um μ μ²λ¦¬λμ κ°μ§λ©° 6 κ°μ μ‘μμ κΈ°κ° μ΄ 36.6 mWμ μ λ ₯μ μλͺ¨νμλ€.CHAPTER 1 INTRODUCTION 1
1.1 MOTIVATION 1
1.2 THESIS ORGANIZATION 4
CHAPTER 2 BACKGROUND ON HIGH-BANDWIDTH MEMORY 6
2.1 OVERVIEW 6
2.2 TRANSCEIVER ARCHITECTURE 10
2.3 READ/WRITE OPERATION 15
2.3.1 READ OPERATION 15
2.3.2 WRITE OPERATION 19
CHAPTER 3 BACKGROUNDS ON COUPLED WIRES 21
3.1 GENERALIZED MODEL 21
3.2 EFFECT OF CROSSTALK 26
CHAPTER 4 DQ RECEIVER WITH BAUD-RATE SELF-TRACKING LOOP 29
4.1 OVERVIEW 29
4.2 FEATURES OF DQ RECEIVER FOR HBM 33
4.3 PROPOSED PULSE-TO-CHARGE PHASE DETECTOR 35
4.3.1 OPERATION OF PULSE-TO-CHARGE PHASE DETECTOR 35
4.3.2 OFFSET CALIBRATION 37
4.3.3 OPERATION SEQUENCE 39
4.4 CIRCUIT IMPLEMENTATION 42
4.5 MEASUREMENT RESULT 46
CHAPTER 5 HIGH-DENSITY TRANSCEIVER FOR HBM WITH 3D-STAGGERED CHANNEL AND CROSSTALK CANCELLATION SCHEME 57
5.1 OVERVIEW 57
5.2 PROPOSED 3D-STAGGERED CHANNEL 61
5.2.1 IMPLEMENTATION OF 3D-STAGGERED CHANNEL 61
5.2.2 CHANNEL CHARACTERISTICS AND MODELING 66
5.3 PROPOSED FEED-FORWARD-EQUALIZER-COMBINED CROSSTALK CANCELLATION SCHEME 72
5.4 CIRCUIT IMPLEMENTATION 77
5.4.1 OVERALL ARCHITECTURE 77
5.4.2 TRANSMITTER WITH FFE-COMBINED XTC 79
5.4.3 RECEIVER 81
5.5 MEASUREMENT RESULT 82
CHAPTER 6 CONCLUSION 93
BIBLIOGRAPHY 95
μ΄ λ‘ 102Docto
Design Techniques for High Pin Efficiency Wireline Transceivers
While the majority of wireline research investigates bandwidth improvement and how to overcome the high channel loss, pin efficiency is also critical in high-performance wireline applications. This dissertation proposes two different implementations for high pin efficiency wireline transceivers. The first prototype achieves twice pin efficiency than unidirectional signaling, which is 32Gb/s simultaneous bidirectional transceiver supporting transmission and reception on the same channel at the same time. It includes an efficient low-swing voltage-mode driver with an R-gm hybrid for signal separation, combining the continuous-time-linear-equalizer (CTLE) and echo cancellation (EC) in a single stage, and employing a low-complexity 5/4X CDA system. Support of a wide range of channels is possible with foreground adaptation of the EC finite impulse response (FIR) filter taps with a sign-sign least-mean-square (SSLMS) algorithm. Fabricated in TSMC 28-nm CMOS, the 32Gb/s SBD transceiver occupies area and achieves 16Gb/s uni-directional and 32Gb/s simultaneous bi-directional signals. 32Gb/s SBD operation consumes 1.83mW/Gb/s with 10.8dB channel loss at Nyquist rate. The second prototype presents an optical transmitter with a quantum-dot (QD) microring laser. This can support wavelength-division multiplexing allowing for high pin efficiency application by packing multiple high-bandwidth signals onto one optical channel. The development QD microring laser model accurately captures the intrinsic photonic high-speed dynamics and allows for the future co-design of the circuits and photonic device. To achieve higher bandwidth than intrinsic one, utilizing both techniques of optical injection locking (OIL) and 2-tap asymmetric Feed-forward equalizer (FFE) can perform 22Gb/s operation with 3.2mW/Gb/s. The first hybrid-integration directly-modulated OIL QD microring laser system is demonstrated
κ³ μ DRAM μΈν°νμ΄μ€λ₯Ό μν μ μ λ° μ¨λμ λκ°ν ν΄λ‘ ν¨μ€μ μμ μ€λ₯ κ΅μ κΈ° μ€κ³
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ 보곡νλΆ, 2021. 2. μ λκ· .To cope with problems caused by the high-speed operation of the dynamic random access memory (DRAM) interface, several approaches are proposed that are focused on the clock path of the DRAM. Two delay-locked loop (DLL) based schemes, a forwarded-clock (FC) receiver (RX) with self-tracking loop and a quadrature error corrector, are proposed. Moreover, an open-loop based scheme is presented for drift compensation in the clock distribution. The open-loop scheme consumes less power consumption and reduces design complexity.
The FC RX uses DLLs to compensate for voltage and temperature (VT) drift in unmatched memory interfaces. The self-tracking loop consists of two-stage cascaded DLLs to operate in a DRAM environment. With the write training and the proposed DLL, the timing relationship between the data and the sampling clock is always optimal. The proposed scheme compensates for delay drift without relying on data transitions or re-training. The proposed FC RX is fabricated in 65-nm CMOS process and has an active area containing 4 data lanes of 0.0329 mm2. After the write training is completed at the supply voltage of 1 V, the measured timing margin remains larger than 0.31-unit interval (UI) when the supply voltage drifts in the range of 0.94 V and 1.06 V from the training voltage, 1 V. At the data rate of 6.4 Gb/s, the proposed FC RX achieves an energy efficiency of 0.45 pJ/bit.
Contrary to the aforementioned scheme, an open-loop-based voltage drift compensation method is proposed to minimize power consumption and occupied area. The overall clock distribution is composed of a current mode logic (CML) path and a CMOS path. In the proposed scheme, the architecture of the CML-to-CMOS converter (C2C) and the inverter is changed to compensate for supply voltage drift. The bias generator provides bias voltages to the C2C and inverters according to supply voltage for delay adjustment. The proposed clock tree is fabricated in 40 nm CMOS process and the active area is 0.004 mm2. When the supply voltage is modulated by a sinusoidal wave with 1 MHz, 100 mV peak-to-peak swing from the center of 1.1 V, applying the proposed scheme reduces the measured root-mean-square (RMS) jitter from 3.77 psRMS to 1.61 psRMS. At 6 GHz output clock, the power consumption of the proposed scheme is 11.02 mW.
A DLL-based quadrature error corrector (QEC) with a wide correction range is proposed for the DRAM whose clocks are distributed over several millimeters. The quadrature error is corrected by adjusting delay lines using information from the phase error detector. The proposed error correction method minimizes increased jitter due to phase error correction by setting at least one of the delay lines in the quadrature clock path to the minimum delay. In addition, the asynchronous calibration on-off scheme reduces power consumption after calibration is complete. The proposed QEC is fabricated in 40 nm CMOS process and has an active area of 0.048 mm2. The proposed QEC exhibits a wide correctable error range of 101.6 ps and the remaining phase errors are less than 2.18Β° from 0.8 GHz to 2.3 GHz clock. At 2.3 GHz, the QEC contributes 0.53 psRMS jitter. Also, at 2.3 GHz, the power consumption is reduced from 8.89 mW to 3.39 mW when the calibration is off.λ³Έ λ
Όλ¬Έμμλ λμ λλ€ μ‘μΈμ€ λ©λͺ¨λ¦¬ (DRAM)μ μλκ° μ¦κ°ν¨μ λ°λΌ ν΄λ‘ ν¨μ€μμ λ°μν μ μλ λ¬Έμ μ λμ²νκΈ° μν μΈ κ°μ§ νλ‘λ€μ μ μνμλ€. μ μν νλ‘λ€ μ€ λ λ°©μλ€μ μ§μ°λ기루ν (delay-locked loop) λ°©μμ μ¬μ©νμκ³ λλ¨Έμ§ ν λ°©μμ λ©΄μ κ³Ό μ λ ₯ μλͺ¨λ₯Ό μ€μ΄κΈ° μν΄ μ€ν 루ν λ°©μμ μ¬μ©νμλ€. DRAMμ λΉμ ν© μμ κΈ° ꡬ쑰μμ λ°μ΄ν° ν¨μ€μ ν΄λ‘ ν¨μ€ κ°μ μ§μ° λΆμΌμΉλ‘ μΈν΄ μ μ λ° μ¨λ λ³νμ λ°λΌ μ
μ
νμ λ° νλ νμμ΄ μ€μ΄λλ λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν΄ μ§μ°λ기루νλ₯Ό μ¬μ©νμλ€. μ μν μ§μ°λ기루ν νλ‘λ DRAM νκ²½μμ λμνλλ‘ λ κ°μ μ§μ°λ기루νλ‘ λλμλ€. λν μ΄κΈ° μ°κΈ° νλ ¨μ ν΅ν΄ λ°μ΄ν°μ ν΄λ‘μ νμ΄λ° λ§μ§ κ΄μ μμ μ΅μ μ μμΉμ λ μ μλ€. λ°λΌμ μ μνλ λ°©μμ λ°μ΄ν° μ²μ΄ μ λ³΄κ° νμνμ§ μλ€. 65-nm CMOS 곡μ μ μ΄μ©νμ¬ λ§λ€μ΄μ§ μΉ©μ 6.4 Gb/sμμ 0.45 pJ/bitμ μλμ§ ν¨μ¨μ κ°μ§λ€. λν 1 Vμμ μ°κΈ° νλ ¨ λ° μ§μ°λ기루νλ₯Ό κ³ μ μν€κ³ 0.94 Vμμ 1.06 VκΉμ§ κ³΅κΈ μ μμ΄ λ°λμμ λ νμ΄λ° λ§μ§μ 0.31 UIλ³΄λ€ ν° κ°μ μ μ§νμλ€.
λ€μμΌλ‘ μ μνλ νλ‘λ ν΄λ‘ λΆν¬ νΈλ¦¬μμ μ μ λ³νλ‘ μΈν΄ ν΄λ‘ ν¨μ€μ μ§μ°μ΄ λ¬λΌμ§λ κ²μ μμ μ μν λ°©μκ³Ό λ¬λ¦¬ μ€ν 루ν λ°©μμΌλ‘ 보μνμλ€. κΈ°μ‘΄ ν΄λ‘ ν¨μ€μ μΈλ²ν°μ CML-to-CMOS λ³νκΈ°μ ꡬ쑰λ₯Ό λ³κ²½νμ¬ λ°μ΄μ΄μ€ μμ± νλ‘μμ μμ±ν κ³΅κΈ μ μμ λ°λΌ λ°λλ λ°μ΄μ΄μ€ μ μμ κ°μ§κ³ μ§μ°μ μ‘°μ ν μ μκ² νμλ€. 40-nm CMOS 곡μ μ μ΄μ©νμ¬ λ§λ€μ΄μ§ μΉ©μ 6 GHz ν΄λ‘μμμ μ λ ₯ μλͺ¨λ 11.02 mWλ‘ μΈ‘μ λμλ€. 1.1 V μ€μ¬μΌλ‘ 1 MHz, 100 mV νΌν¬ ν¬ νΌν¬λ₯Ό κ°μ§λ μ¬μΈν μ±λΆμΌλ‘ κ³΅κΈ μ μμ λ³μ‘°νμμ λ μ μν λ°©μμμμ μ§ν°λ κΈ°μ‘΄ λ°©μμ 3.77 psRMSμμ 1.61 psRMSλ‘ μ€μ΄λ€μλ€.
DRAMμ μ‘μ κΈ° ꡬ쑰μμ λ€μ€ μμ ν΄λ‘ κ°μ μμ μ€μ°¨λ μ‘μ λ λ°μ΄ν°μ λ°μ΄ν° μ ν¨ μ°½μ κ°μμν¨λ€. μ΄λ₯Ό ν΄κ²°νκΈ° μν΄ μ§μ°λ기루νλ₯Ό λμ
νκ² λλ©΄ μ¦κ°λ μ§μ°μΌλ‘ μΈν΄ μμμ΄ κ΅μ λ ν΄λ‘μμ μ§ν°κ° μ¦κ°νλ€. λ³Έ λ
Όλ¬Έμμλ μ¦κ°λ μ§ν°λ₯Ό μ΅μννκΈ° μν΄ μμ κ΅μ μΌλ‘ μΈν΄ μ¦κ°λ μ§μ°μ μ΅μννλ μμ κ΅μ νλ‘λ₯Ό μ μνμλ€. λν μ ν΄ μνμμ μ λ ₯ μλͺ¨λ₯Ό μ€μ΄κΈ° μν΄ μμ μ€μ°¨λ₯Ό κ΅μ νλ νλ‘λ₯Ό μ
λ ₯ ν΄λ‘κ³Ό λΉλκΈ°μμΌλ‘ λ μ μλ λ°©λ² λν μ μνμλ€. 40-nm CMOS 곡μ μ μ΄μ©νμ¬ λ§λ€μ΄μ§ μΉ©μ μμ κ΅μ λ²μλ 101.6 psμ΄κ³ 0.8 GHz λΆν° 2.3 GHzκΉμ§μ λμ μ£Όνμ λ²μμμ μμ κ΅μ κΈ°μ μΆλ ₯ ν΄λ‘μ μμ μ€μ°¨λ 2.18Β°λ³΄λ€ μλ€. μ μνλ μμ κ΅μ νλ‘λ‘ μΈν΄ μΆκ°λ μ§ν°λ 2.3 GHzμμ 0.53 psRMSμ΄κ³ κ΅μ νλ‘λ₯Ό κ»μ λ μ λ ₯ μλͺ¨λ κ΅μ νλ‘κ° μΌμ‘μ λμΈ 8.89 mWμμ 3.39 mWλ‘ μ€μ΄λ€μλ€.Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Thesis Organization 4
Chapter 2 Background on DRAM Interface 5
2.1 Overview 5
2.2 Memory Interface 7
Chapter 3 Background on DLL 11
3.1 Overview 11
3.2 Building Blocks 15
3.2.1 Delay Line 15
3.2.2 Phase Detector 17
3.2.3 Charge Pump 19
3.2.4 Loop filter 20
Chapter 4 Forwarded-Clock Receiver with DLL-based Self-tracking Loop for Unmatched Memory Interfaces 21
4.1 Overview 21
4.2 Proposed Separated DLL 25
4.2.1 Operation of the Proposed Separated DLL 27
4.2.2 Operation of the Digital Loop Filter in DLL 31
4.3 Circuit Implementation 33
4.4 Measurement Results 37
4.4.1 Measurement Setup and Sequence 38
4.4.2 VT Drift Measurement and Simulation 40
Chapter 5 Open-loop-based Voltage Drift Compensation in Clock Distribution 46
5.1 Overview 46
5.2 Prior Works 50
5.3 Voltage Drift Compensation Method 52
5.4 Circuit Implementation 57
5.5 Measurement Results 61
Chapter 6 Quadrature Error Corrector with Minimum Total Delay Tracking 68
6.1 Overview 68
6.2 Prior Works 70
6.3 Quadrature Error Correction Method 73
6.4 Circuit Implementation 82
6.5 Measurement Results 88
Chapter 7 Conclusion 96
Bibliography 98
μ΄λ‘ 102Docto
μ μ λ ₯, μ λ©΄μ μ μ μ‘μμ κΈ° μ€κ³λ₯Ό μν νλ‘ κΈ°μ
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ : μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2016. 8. μ λκ· .In this thesis, novel circuit techniques for low-power and area-efficient wireline transceiver, including a phase-locked loop (PLL) based on a two-stage ring oscillator, a scalable voltage-mode transmitter, and a forwarded-clock (FC) receiver based on a delay-locked-loop (DLL) based per-pin deskew, are proposed.
At first, a two-stage ring PLL that provides a four-phase, high-speed clock for a quarter-rate TX in order to minimize power consumption is presented. Several analyses and verification techniques, ranging from the clocking architectures for a high-speed TX to oscillation failures in a two-stage ring oscillator, are addressed in this thesis. A tri-state-inverterβbased frequency-divider and an AC-coupled clock-buffer are used for high-speed operations with minimal power and area overheads. The proposed PLL fabricated in the 65-nm CMOS technology occupies an active area of 0.009 mm2 with an integrated-RMS-jitter of 414 fs from 10 kHz to 100 MHz while consuming 7.6 mW from a 1.2-V supply at 10 GHz. The resulting figure-of-merit is -238.8 dB, which surpasses that of the state-of-the-art ring-PLLs by 4 dB.
Secondly, a voltage-mode (VM) transmitter which offers a wide operation range of 6 to 32 Gb/s, controllable pre-emphasis equalization and output voltage swing without altering output impedance, and a power supply scalability is presented. A quarter-rate clocking architecture is employed in order to maximize the scalability and energy efficiency across the variety of operating conditions. A P-over-N VM driver is used for CMOS compatibility and wide voltage-swing range required for various I/O standards. Two supply regulators calibrate the output impedance of the VM driver across the wide swing and pre-emphasis range. A single phase-locked loop is used to provide a wide frequency range of 1.5-to-8 GHz. The prototype chip is fabricated in 65-nm CMOS technology and occupies active area of 0.48x0.36 mm2. The proposed transmitter achieves 250-to-600-mV single-ended swing and exhibits the energy efficiency of 2.10-to-2.93 pJ/bit across the data rate of 6-to-32 Gb/s.
And last, this thesis describes a power and area-efficient FC receiver and includes an analysis of the jitter tolerance of the FC receiver. In the proposed design, jitter tolerance is maximized according to the analysis by employing a DLL-based de-skewing. A sample-swapping bang-bang phase-detector (SS-BBPD) eliminates the stuck locking caused by the finite delay range of the voltage-controlled delay line (VCDL), and also reduces the required delay range of the VCDL by half. The proposed FC receiver is fabricated in 65-nm CMOS technology and occupies an active area of 0.025 mm2. At a data rate of 12.5 Gb/s, the proposed FC receiver exhibits an energy efficiency of 0.36 pJ/bit, and tolerates 1.4-UIpp sinusoidal jitter of 300 MHz.Chapter 1. Introduction 1
1.1. Motivation 1
1.2. Thesis organization 5
Chapter 2. Phase-Locked Loop Based on Two-Stage Ring Oscillator 7
2.1. Overivew 7
2.2. Background and Analysis of a Two-stage Ring Oscillator 11
2.3. Circuit Implementation of The Proposed PLL 25
2.4. Measurement Results 33
Chapter 3. A Scalable Voltage-Mode Transmitter 37
3.1. Overview 37
3.2. Design Considerations on a Scalable Serial Link Transmitter 40
3.3. Circuit Implementation 46
3.4. Measurement Results 56
Chapter 4. Delay-Locked Loop Based Forwarded-Clock Receiver 62
4.1. Overview 62
4.2. Timing and Data Recovery in a Serial Link 65
4.3. DLL-Based Forwarded-Clock Receiver Characteristics 70
4.4. Circuit Implementation 79
4.5. Measurement Results 89
Chapter 5. Conclusion 94
Appendix 96
Appendix A. Design flow to optimize a high-speed ring oscillator 96
Appendix B. Reflection Issues in N-over-N Voltage-Mode Driver 99
Appendix C. Analysis on output swing and power consumption of the P-over-N voltage-mode driver 107
Appendix D. Loop Dynamics of DLL 112
Bibliography 121
Abstract 128Docto