913 research outputs found
Jitter Tracking Bandwidth Optimization Using Active-Inductor-Based Bandpass Filtering in High-speed Forwarded Clock Transceivers
Inter-chip input-output (I/O) communication bandwidth demand, which rapidly scaled with integrated circuit scaling, requires high performance I/O links to achieve a per pin data rate as high as multi-Gb/s. The design of high-speed links employing forwarded-clock architecture enables jitter tracking between data and clock from low to high frequencies. Considering the impact of clock to data skew, high frequency sampling clock jitter and data jitter become out of phase at receiver, which reduces the timing margin and limits the data rate. The jitter tracking bandwidth (JTB) between data and clock should be optimized to compensate the clock to data skew. System level analysis shows that the wide tunable range of JTB is needed to compensate different amounts of skews.
The implementation of bandpass filtering on forwarded-clock path is able to control the JTB through the controlling of Q. This work introduces a method using bandpass filtering to optimize the JTB in high-speed forwarded-clock transceivers, followed by the implementation of active-inductor-based bandpass filter as clock receiver, which has advantages of low-voltage operation, low power as well as low area consumption. Simulation results shows that the designed filter provides controllable JTB over 40 - 600MHz. The bandpass filter is implemented in IBM 90nm CMOS process
κ³ μ DRAM μΈν°νμ΄μ€λ₯Ό μν μ μ λ° μ¨λμ λκ°ν ν΄λ‘ ν¨μ€μ μμ μ€λ₯ κ΅μ κΈ° μ€κ³
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ 보곡νλΆ, 2021. 2. μ λκ· .To cope with problems caused by the high-speed operation of the dynamic random access memory (DRAM) interface, several approaches are proposed that are focused on the clock path of the DRAM. Two delay-locked loop (DLL) based schemes, a forwarded-clock (FC) receiver (RX) with self-tracking loop and a quadrature error corrector, are proposed. Moreover, an open-loop based scheme is presented for drift compensation in the clock distribution. The open-loop scheme consumes less power consumption and reduces design complexity.
The FC RX uses DLLs to compensate for voltage and temperature (VT) drift in unmatched memory interfaces. The self-tracking loop consists of two-stage cascaded DLLs to operate in a DRAM environment. With the write training and the proposed DLL, the timing relationship between the data and the sampling clock is always optimal. The proposed scheme compensates for delay drift without relying on data transitions or re-training. The proposed FC RX is fabricated in 65-nm CMOS process and has an active area containing 4 data lanes of 0.0329 mm2. After the write training is completed at the supply voltage of 1 V, the measured timing margin remains larger than 0.31-unit interval (UI) when the supply voltage drifts in the range of 0.94 V and 1.06 V from the training voltage, 1 V. At the data rate of 6.4 Gb/s, the proposed FC RX achieves an energy efficiency of 0.45 pJ/bit.
Contrary to the aforementioned scheme, an open-loop-based voltage drift compensation method is proposed to minimize power consumption and occupied area. The overall clock distribution is composed of a current mode logic (CML) path and a CMOS path. In the proposed scheme, the architecture of the CML-to-CMOS converter (C2C) and the inverter is changed to compensate for supply voltage drift. The bias generator provides bias voltages to the C2C and inverters according to supply voltage for delay adjustment. The proposed clock tree is fabricated in 40 nm CMOS process and the active area is 0.004 mm2. When the supply voltage is modulated by a sinusoidal wave with 1 MHz, 100 mV peak-to-peak swing from the center of 1.1 V, applying the proposed scheme reduces the measured root-mean-square (RMS) jitter from 3.77 psRMS to 1.61 psRMS. At 6 GHz output clock, the power consumption of the proposed scheme is 11.02 mW.
A DLL-based quadrature error corrector (QEC) with a wide correction range is proposed for the DRAM whose clocks are distributed over several millimeters. The quadrature error is corrected by adjusting delay lines using information from the phase error detector. The proposed error correction method minimizes increased jitter due to phase error correction by setting at least one of the delay lines in the quadrature clock path to the minimum delay. In addition, the asynchronous calibration on-off scheme reduces power consumption after calibration is complete. The proposed QEC is fabricated in 40 nm CMOS process and has an active area of 0.048 mm2. The proposed QEC exhibits a wide correctable error range of 101.6 ps and the remaining phase errors are less than 2.18Β° from 0.8 GHz to 2.3 GHz clock. At 2.3 GHz, the QEC contributes 0.53 psRMS jitter. Also, at 2.3 GHz, the power consumption is reduced from 8.89 mW to 3.39 mW when the calibration is off.λ³Έ λ
Όλ¬Έμμλ λμ λλ€ μ‘μΈμ€ λ©λͺ¨λ¦¬ (DRAM)μ μλκ° μ¦κ°ν¨μ λ°λΌ ν΄λ‘ ν¨μ€μμ λ°μν μ μλ λ¬Έμ μ λμ²νκΈ° μν μΈ κ°μ§ νλ‘λ€μ μ μνμλ€. μ μν νλ‘λ€ μ€ λ λ°©μλ€μ μ§μ°λ기루ν (delay-locked loop) λ°©μμ μ¬μ©νμκ³ λλ¨Έμ§ ν λ°©μμ λ©΄μ κ³Ό μ λ ₯ μλͺ¨λ₯Ό μ€μ΄κΈ° μν΄ μ€ν 루ν λ°©μμ μ¬μ©νμλ€. DRAMμ λΉμ ν© μμ κΈ° ꡬ쑰μμ λ°μ΄ν° ν¨μ€μ ν΄λ‘ ν¨μ€ κ°μ μ§μ° λΆμΌμΉλ‘ μΈν΄ μ μ λ° μ¨λ λ³νμ λ°λΌ μ
μ
νμ λ° νλ νμμ΄ μ€μ΄λλ λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν΄ μ§μ°λ기루νλ₯Ό μ¬μ©νμλ€. μ μν μ§μ°λ기루ν νλ‘λ DRAM νκ²½μμ λμνλλ‘ λ κ°μ μ§μ°λ기루νλ‘ λλμλ€. λν μ΄κΈ° μ°κΈ° νλ ¨μ ν΅ν΄ λ°μ΄ν°μ ν΄λ‘μ νμ΄λ° λ§μ§ κ΄μ μμ μ΅μ μ μμΉμ λ μ μλ€. λ°λΌμ μ μνλ λ°©μμ λ°μ΄ν° μ²μ΄ μ λ³΄κ° νμνμ§ μλ€. 65-nm CMOS 곡μ μ μ΄μ©νμ¬ λ§λ€μ΄μ§ μΉ©μ 6.4 Gb/sμμ 0.45 pJ/bitμ μλμ§ ν¨μ¨μ κ°μ§λ€. λν 1 Vμμ μ°κΈ° νλ ¨ λ° μ§μ°λ기루νλ₯Ό κ³ μ μν€κ³ 0.94 Vμμ 1.06 VκΉμ§ κ³΅κΈ μ μμ΄ λ°λμμ λ νμ΄λ° λ§μ§μ 0.31 UIλ³΄λ€ ν° κ°μ μ μ§νμλ€.
λ€μμΌλ‘ μ μνλ νλ‘λ ν΄λ‘ λΆν¬ νΈλ¦¬μμ μ μ λ³νλ‘ μΈν΄ ν΄λ‘ ν¨μ€μ μ§μ°μ΄ λ¬λΌμ§λ κ²μ μμ μ μν λ°©μκ³Ό λ¬λ¦¬ μ€ν 루ν λ°©μμΌλ‘ 보μνμλ€. κΈ°μ‘΄ ν΄λ‘ ν¨μ€μ μΈλ²ν°μ CML-to-CMOS λ³νκΈ°μ ꡬ쑰λ₯Ό λ³κ²½νμ¬ λ°μ΄μ΄μ€ μμ± νλ‘μμ μμ±ν κ³΅κΈ μ μμ λ°λΌ λ°λλ λ°μ΄μ΄μ€ μ μμ κ°μ§κ³ μ§μ°μ μ‘°μ ν μ μκ² νμλ€. 40-nm CMOS 곡μ μ μ΄μ©νμ¬ λ§λ€μ΄μ§ μΉ©μ 6 GHz ν΄λ‘μμμ μ λ ₯ μλͺ¨λ 11.02 mWλ‘ μΈ‘μ λμλ€. 1.1 V μ€μ¬μΌλ‘ 1 MHz, 100 mV νΌν¬ ν¬ νΌν¬λ₯Ό κ°μ§λ μ¬μΈν μ±λΆμΌλ‘ κ³΅κΈ μ μμ λ³μ‘°νμμ λ μ μν λ°©μμμμ μ§ν°λ κΈ°μ‘΄ λ°©μμ 3.77 psRMSμμ 1.61 psRMSλ‘ μ€μ΄λ€μλ€.
DRAMμ μ‘μ κΈ° ꡬ쑰μμ λ€μ€ μμ ν΄λ‘ κ°μ μμ μ€μ°¨λ μ‘μ λ λ°μ΄ν°μ λ°μ΄ν° μ ν¨ μ°½μ κ°μμν¨λ€. μ΄λ₯Ό ν΄κ²°νκΈ° μν΄ μ§μ°λ기루νλ₯Ό λμ
νκ² λλ©΄ μ¦κ°λ μ§μ°μΌλ‘ μΈν΄ μμμ΄ κ΅μ λ ν΄λ‘μμ μ§ν°κ° μ¦κ°νλ€. λ³Έ λ
Όλ¬Έμμλ μ¦κ°λ μ§ν°λ₯Ό μ΅μννκΈ° μν΄ μμ κ΅μ μΌλ‘ μΈν΄ μ¦κ°λ μ§μ°μ μ΅μννλ μμ κ΅μ νλ‘λ₯Ό μ μνμλ€. λν μ ν΄ μνμμ μ λ ₯ μλͺ¨λ₯Ό μ€μ΄κΈ° μν΄ μμ μ€μ°¨λ₯Ό κ΅μ νλ νλ‘λ₯Ό μ
λ ₯ ν΄λ‘κ³Ό λΉλκΈ°μμΌλ‘ λ μ μλ λ°©λ² λν μ μνμλ€. 40-nm CMOS 곡μ μ μ΄μ©νμ¬ λ§λ€μ΄μ§ μΉ©μ μμ κ΅μ λ²μλ 101.6 psμ΄κ³ 0.8 GHz λΆν° 2.3 GHzκΉμ§μ λμ μ£Όνμ λ²μμμ μμ κ΅μ κΈ°μ μΆλ ₯ ν΄λ‘μ μμ μ€μ°¨λ 2.18Β°λ³΄λ€ μλ€. μ μνλ μμ κ΅μ νλ‘λ‘ μΈν΄ μΆκ°λ μ§ν°λ 2.3 GHzμμ 0.53 psRMSμ΄κ³ κ΅μ νλ‘λ₯Ό κ»μ λ μ λ ₯ μλͺ¨λ κ΅μ νλ‘κ° μΌμ‘μ λμΈ 8.89 mWμμ 3.39 mWλ‘ μ€μ΄λ€μλ€.Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Thesis Organization 4
Chapter 2 Background on DRAM Interface 5
2.1 Overview 5
2.2 Memory Interface 7
Chapter 3 Background on DLL 11
3.1 Overview 11
3.2 Building Blocks 15
3.2.1 Delay Line 15
3.2.2 Phase Detector 17
3.2.3 Charge Pump 19
3.2.4 Loop filter 20
Chapter 4 Forwarded-Clock Receiver with DLL-based Self-tracking Loop for Unmatched Memory Interfaces 21
4.1 Overview 21
4.2 Proposed Separated DLL 25
4.2.1 Operation of the Proposed Separated DLL 27
4.2.2 Operation of the Digital Loop Filter in DLL 31
4.3 Circuit Implementation 33
4.4 Measurement Results 37
4.4.1 Measurement Setup and Sequence 38
4.4.2 VT Drift Measurement and Simulation 40
Chapter 5 Open-loop-based Voltage Drift Compensation in Clock Distribution 46
5.1 Overview 46
5.2 Prior Works 50
5.3 Voltage Drift Compensation Method 52
5.4 Circuit Implementation 57
5.5 Measurement Results 61
Chapter 6 Quadrature Error Corrector with Minimum Total Delay Tracking 68
6.1 Overview 68
6.2 Prior Works 70
6.3 Quadrature Error Correction Method 73
6.4 Circuit Implementation 82
6.5 Measurement Results 88
Chapter 7 Conclusion 96
Bibliography 98
μ΄λ‘ 102Docto
Recommended from our members
Architectures and Circuits Leveraging Injection-Locked Oscillators for Ultra-Low Voltage Clock Synthesis and Reference-less Receivers for Dense Chip-to-Chip Communications
High performance computing is critical for the needs of scientific discovery and economic competitiveness. An extreme-scale computing system at 1000x the performance of todayβs petaflop machines will exhibit massive parallelism on multiple vertical fronts, from thousands of computational units on a single processor to thousands of processors in a single data center. To facilitate such a massively-parallel extreme-scale computing, a key challenge is power. The challenge is not power associated with base computation but rather the problem of transporting data from one chip to another at high enough rates. This thesis presents architectures and techniques to achieve low power and area footprint while achieving high data rates in a dense very-short reach (VSR) chip-to-chip (C2C) communication network. High-speed serial communication operating at ultra-low supplies improves the energy-efficiency and lowers the power envelop of a system doing an exaflop of loops. One focus area of this thesis is clock synthesis for such energy-efficient interconnect applications operating at high speeds and ultra-low supplies. A sub-integer clockfrequency synthesizer is presented that incorporates a multi-phase injection-locked ring-oscillator-based prescaler for operation at an ultra-low supply voltage of 0.5V, phase-switching based programmable division for sub-integer clock-frequency synthesis, and automatic calibration to ensure injection lock. A record speed of 9GHz has been demonstrated at 0.5V in 45nm SOI CMOS. It consumes 3.5mW of power at 9.12GHz and 0.052 of area, while showing an output phase noise of -100dBc/Hz at 1MHz offset and RMS jitter of 325fs; it achieves a net of -186.5 in a 45-nm SOI CMOS process. This thesis also describes a receiver with a reference-less clocking architecture for high-density VSR-C2C links. This architecture simplifies clock-tree planning in dense extreme-scaling computing environments and has high-bandwidth CDR to enable SSC for suppressing EMI and to mitigate TX jitter requirements. It features clock-less DFE and a high-bandwidth CDR based on master-slave ILOs for phase generation/rotation. The RX is implemented in 14nm CMOS and characterized at 19Gb/s. It is 1.5x faster that previous reference-less embedded-oscillator based designs with greater than 100MHz jitter tolerance bandwidth and recovers error-free data over VSR-C2C channels. It achieves a power-efficiency of 2.9pJ/b while recovering error-free data (BER 200MHz and the INL of the ILO-based phase-rotator (32- Steps/UI) is <1-LSB. Lastly, this thesis develops a time-domain delay-based modeling of injection locking to describe injection-locking phenomena in nonharmonic oscillators. The model is used to predict the locking bandwidth, and the locking dynamics of the locked oscillator. The model predictions are verified against simulations and measurements of a four-stage differential ring oscillator. The model is further used to predict the injection-locking behavior of a single-ended CMOS inverter based ring oscillator, the lock range of a multi-phase injection-locked ring-oscillator-based prescaler, as well as the dynamics of tracking injection phase perturbations in injection-locked masterslave oscillators; demonstrating its versatility in application to any nonharmonic oscillator
A Wideband Injection-Locking Scheme and Quadrature Phase Generation in 65-nm CMOS
A novel technique for wideband injection locking in an LC oscillator is proposed. Phased-lock-loop and injection-locking elements are combined symbiotically to achieve wide locking range while retaining the simplicity of the latter. This method does not require a phase frequency detector or a loop filter to achieve phase lock. A mathematical analysis of the system is presented and the expression for new locking range is derived. A locking range of 13.4-17.2 GHz and an average jitter tracking bandwidth of up to 400 MHz were measured in a high- Q LC oscillator. This architecture is used to generate quadrature phases from a single clock without any frequency division. It also provides high-frequency jitter filtering while retaining the low-frequency correlated jitter essential for forwarded clock receivers
μ μ λ ₯, μ λ©΄μ μ μ μ‘μμ κΈ° μ€κ³λ₯Ό μν νλ‘ κΈ°μ
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ : μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2016. 8. μ λκ· .In this thesis, novel circuit techniques for low-power and area-efficient wireline transceiver, including a phase-locked loop (PLL) based on a two-stage ring oscillator, a scalable voltage-mode transmitter, and a forwarded-clock (FC) receiver based on a delay-locked-loop (DLL) based per-pin deskew, are proposed.
At first, a two-stage ring PLL that provides a four-phase, high-speed clock for a quarter-rate TX in order to minimize power consumption is presented. Several analyses and verification techniques, ranging from the clocking architectures for a high-speed TX to oscillation failures in a two-stage ring oscillator, are addressed in this thesis. A tri-state-inverterβbased frequency-divider and an AC-coupled clock-buffer are used for high-speed operations with minimal power and area overheads. The proposed PLL fabricated in the 65-nm CMOS technology occupies an active area of 0.009 mm2 with an integrated-RMS-jitter of 414 fs from 10 kHz to 100 MHz while consuming 7.6 mW from a 1.2-V supply at 10 GHz. The resulting figure-of-merit is -238.8 dB, which surpasses that of the state-of-the-art ring-PLLs by 4 dB.
Secondly, a voltage-mode (VM) transmitter which offers a wide operation range of 6 to 32 Gb/s, controllable pre-emphasis equalization and output voltage swing without altering output impedance, and a power supply scalability is presented. A quarter-rate clocking architecture is employed in order to maximize the scalability and energy efficiency across the variety of operating conditions. A P-over-N VM driver is used for CMOS compatibility and wide voltage-swing range required for various I/O standards. Two supply regulators calibrate the output impedance of the VM driver across the wide swing and pre-emphasis range. A single phase-locked loop is used to provide a wide frequency range of 1.5-to-8 GHz. The prototype chip is fabricated in 65-nm CMOS technology and occupies active area of 0.48x0.36 mm2. The proposed transmitter achieves 250-to-600-mV single-ended swing and exhibits the energy efficiency of 2.10-to-2.93 pJ/bit across the data rate of 6-to-32 Gb/s.
And last, this thesis describes a power and area-efficient FC receiver and includes an analysis of the jitter tolerance of the FC receiver. In the proposed design, jitter tolerance is maximized according to the analysis by employing a DLL-based de-skewing. A sample-swapping bang-bang phase-detector (SS-BBPD) eliminates the stuck locking caused by the finite delay range of the voltage-controlled delay line (VCDL), and also reduces the required delay range of the VCDL by half. The proposed FC receiver is fabricated in 65-nm CMOS technology and occupies an active area of 0.025 mm2. At a data rate of 12.5 Gb/s, the proposed FC receiver exhibits an energy efficiency of 0.36 pJ/bit, and tolerates 1.4-UIpp sinusoidal jitter of 300 MHz.Chapter 1. Introduction 1
1.1. Motivation 1
1.2. Thesis organization 5
Chapter 2. Phase-Locked Loop Based on Two-Stage Ring Oscillator 7
2.1. Overivew 7
2.2. Background and Analysis of a Two-stage Ring Oscillator 11
2.3. Circuit Implementation of The Proposed PLL 25
2.4. Measurement Results 33
Chapter 3. A Scalable Voltage-Mode Transmitter 37
3.1. Overview 37
3.2. Design Considerations on a Scalable Serial Link Transmitter 40
3.3. Circuit Implementation 46
3.4. Measurement Results 56
Chapter 4. Delay-Locked Loop Based Forwarded-Clock Receiver 62
4.1. Overview 62
4.2. Timing and Data Recovery in a Serial Link 65
4.3. DLL-Based Forwarded-Clock Receiver Characteristics 70
4.4. Circuit Implementation 79
4.5. Measurement Results 89
Chapter 5. Conclusion 94
Appendix 96
Appendix A. Design flow to optimize a high-speed ring oscillator 96
Appendix B. Reflection Issues in N-over-N Voltage-Mode Driver 99
Appendix C. Analysis on output swing and power consumption of the P-over-N voltage-mode driver 107
Appendix D. Loop Dynamics of DLL 112
Bibliography 121
Abstract 128Docto
λ°μ΄ν° μ μ‘λ‘ νμ₯μ±κ³Ό 루ν μ νμ±μ ν₯μμν¨ λ€μ€μ±λ μμ κΈ°λ€μ κ΄ν μ°κ΅¬
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ : μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2013. 2. μ λκ· .Two types of serial data communication receivers that adopt a multichannel architecture for a high aggregate I/O bandwidth are presented. Two techniques for collaboration and sharing among channels are proposed to enhance the loop-linearity and channel-expandability of multichannel receivers, respectively.
The first proposed receiver employs a collaborative timing scheme recovery which relies on the sharing of all outputs of phase detectors (PDs) among channels to extract common information about the timing and multilevel signaling architecture of PAM-4. The shared timing information is processed by a common global loop filter and is used to update the phase of the voltage-controlled oscillator with better rejection of per-channel noise. In addition to collaborative timing recovery, a simple linearization technique for binary PDs is proposed. The technique realizes a high-rate oversampling PD while the hardware cost is equivalent to that of a conventional 2x-oversampling clock and data recovery. The first receiver exploiting the collaborative timing recovery architecture is designed using 45-nm CMOS technology. A single data lane occupies a 0.195-mm2 area and consumes a relatively low 17.9 mW at 6 Gb/s at 1.0V. Therefore, the power efficiency is 2.98 mW/Gb/s. The simulated jitter is about 0.034 UI RMS given an input jitter value of 0.03 UI RMS, while the relatively constant loop bandwidth with the PD linearization technique is about 7.3-MHz regardless of the data-stream noise.
Unlike the first receiver, the second proposed multichannel receiver was designed to reduce the hardware complexity of each lane. The receiver employs shared calibration logic among channels and yet achieves superior channel expandability with slim data lanes. A shared global calibration control, which is used in a forwarded clock receiver based on a multiphase delay-locked loop, accomplishes skew calibration, equalizer adaptation, and the phase lock of all channels during a calibration period, resulting in reduced hardware overhead and less area required by each data lane. The
second forwarded clock receiver is designed in 90-nm CMOS technology. It achieves error-free eye openings of more than 0.5 UI across 9β 28 inch Nelco 4000-6 microstrips at 4β 7 Gb/s and more than 0.42 UI at data rates of up to 9 Gb/s. The data lane occupies only 0.152 mm2 and consumes 69.8 mW, while the rest of the receiver occupies 0.297 mm2 and consumes 56 mW at a data rate of 7 Gb/s and a supply voltage of 1.35 V.1. Introduction 1
1.1 Motivations
1.2 Thesis Organization
2. Previous Receivers for Serial-Data Communications
2.1 Classification of the Links
2.2 Clocking architecture of transceivers
2.3 Components of receiver
2.3.1 Channel loss
2.3.2 Equalizer
2.3.3 Clock and data recovery circuit
2.3.3.1. Basic architecture
2.3.3.2. Phase detector
2.3.3.2.1. Linear phase detector
2.3.3.2.2. Binary phase detector
2.3.3.3. Frequency detector
2.3.3.4. Charge pump
2.3.3.5. Voltage controlled oscillator and delay-line
2.3.4 Loop dynamics of PLL
2.3.5 Loop dynamics of DLL
3. The Proposed PLL-Based Receiver with Loop Linearization Technique
3.1 Introduction
3.2 Motivation
3.3 Overview of binary phase detection
3.4 The proposed BBPD linearization technique
3.4.1 Architecture of the proposed PLL-based receiver
3.4.2 Linearization technique of binary phase detection
3.4.3 Rotational pattern of sampling phase offset
3.5 PD gain analysis and optimization
3.6 Loop Dynamics of the 2nd-order CDR
3.7 Verification with the time-accurate behavioral simulation
3.8 Summary
4. The Proposed DLL-Based Receiver with Forwarded-Clock
4.1 Introduction
4.2 Motivation
4.3 Design consideration
4.4 Architecture of the proposed forwarded-clock receiver
4.5 Circuit description
4.5.1 Analog multi-phase DLL
4.5.2 Dual-input interpolating deley cells
4.5.3 Dedicated half-rate data samplers
4.5.4 Cherry-Hooper continuous-time linear equalizer
4.5.5 Equalizer adaptation and phase-lock scheme
4.6 Measurement results
5. Conclusion
6. BibliographyDocto
Trigger and Timing Distributions using the TTC-PON and GBT Bridge Connection in ALICE for the LHC Run 3 Upgrade
The ALICE experiment at CERN is preparing for a major upgrade for the third
phase of data taking run (Run 3), when the high luminosity phase of the Large
Hadron Collider (LHC) starts. The increase in the beam luminosity will result
in high interaction rate causing the data acquisition rate to exceed 3 TB/sec.
In order to acquire data for all the events and to handle the increased data
rate, a transition in the readout electronics architecture from the triggered
to the trigger-less acquisition mode is required. In this new architecture, a
dedicated electronics block called the Common Readout Unit (CRU) is defined to
act as a nodal communication point for detector data aggregation and as a
distribution point for timing, trigger and control (TTC) information. TTC
information in the upgraded triggerless readout architecture uses two
asynchronous high-speed serial links connections: the TTC-PON and the GBT. We
have carried out a study to evaluate the quality of the embedded timing signals
forwarded by the CRU to the connected electronics using the TTC-PON and GBT
bridge connection. We have used four performance metrics to characterize the
communication bridge: (a)the latency added by the firmware logic, (b)the jitter
cleaning effect of the PLL on the timing signal, (c)BER analysis for
quantitative measurement of signal quality, and (d)the effect of optical
transceivers parameter settings on the signal strength. Reliability study of
the bridge connection in maintaining the phase consistency of timing signals is
conducted by performing multiple iterations of power on/off cycle, firmware
upgrade and reset assertion/de-assertion cycle (PFR cycle). The test results
are presented and discussed concerning the performance of the TTC-PON and GBT
bridge communication chain using the CRU prototype and its compliance with the
ALICE timing requirements
Towards the Design of Robust High-Speed and Power Efficient Short Reach Photonic Links
In 2014, approximately eight trillion transistors were fabricated every second thanks to improvements in integration density and fabrication processes. This increase in integration and functionality has also brought about the possibility of system on chip (SoC) and high-performance computing (HPC). Electrical interconnects presently dominate the very-short reach interconnect landscape (< 5 cm) in these applications. This, however, is expected to change. These interconnects' downfall will be caused by their need for impedance matching, limited pin-density and frequency dependent loss leading to intersymbol interference. In an attempt to solve this, researchers have increasingly explored integrated silicon photonics as it is compatible with current CMOS processes and creates many possibilities for short-reach applications.
Many see optical interconnects as the high-speed link solution for applications ranging from intra-data center (~200 m) down to module or even chip scales (< 2 cm). The attractive properties of optical interconnects, such as low loss and multiplexing abilities, will enable such things as Exascale high-performance computers of the future (equal to 10^18 calculations per second). In fact, forecasts predict that by 2025 photonics at the smallest levels of the interconnect hierarchy will be a reality. This thesis presents three novel research projects, which all work towards increasing robustness and cost-efficiency in short-reach optical links. It discusses three parts of the optical link: the interconnect, the receiver and the photodiode.
The first topic of this thesis is exploratory work on the use of an optical multiplexing technique, mode-division multiplexing (MDM), to carry multiple data lanes along with a forwarded clock for very short-reach applications. The second topic discussed is a novel reconfigurable CMOS receiver proposed as a method to map a clock signal to an interconnect lane in an MDM source-synchronous link with the lowest optical crosstalk. The receiver is designed as a method to make electronic chips that suit the needs of optical ones. By leveraging the more robust electronic integrated circuit, link solutions can be tuned to meet the needs of photonic chips on a die by die basis. The third topic of this thesis proposes a novel photodetector which uses photonic grating couplers to redirect vertical incident light to the horizontal direction. With this technique, the light is applied along the entire length of a p-n junction to improve the responsivity and speed of the device. Experimental results for this photodetector at 35 Gb/s are published, showing it to be the fastest all-silicon based photodetector reported in the literature at the time of publication
Recommended from our members
Analysis and design on low-power multi-Gb/s serial links
High speed serial links are critical components for addressing the growing demand for I/O bandwidth in next-generation computing applications, such as many-core systems, backplane and optical data communications. Due to continued process scaling and circuit innovations, today's CMOS serial link transceivers can achieve tens of Gb/s per pin. However, most of their reported power efficiency improves much slower than the rise of data rate. Therefore, aggregate I/O power is increasing and will exceed the power budget if the trend for more off-chip bandwidth is sustained.
In this work, a system level statistical analysis of serial links is first described, and compares the link performance of Non-Return-to-Zero (2-PAM) with higher-order modulation (duobinary) signaling schemes. This method enables fast and accurate BER distribution simulation of serial link transceivers that include channel and circuit imperfections, such as finite pulse rise/fall time, duty cycle variation, and both receiver and transmitter forwarded-clock jitter.
Second, in order to address link power efficiency, two test chips have been implemented. The first one describes a quad-lane, 6.4-7.2 Gb/s serial link receiver prototype using a forwarded clock architecture. A novel phase deskew scheme using injection-locked ring oscillators (ILRO) is proposed that achieves greater than one UI of phase shift for multiple clock phases, eliminating phase rotation and interpolation required in conventional architectures. Each receiver, optimized for power efficiency, consists of a low-power linear equalizer, four offset-cancelled quantizers for 1:4 demultiplexing, and an injection-locked ring oscillator coupled to a low-voltage swing, global clock distribution. Measurement results show a 6.4-7.2Gb/s data rate with BER < 10β»ΒΉΒ² across 14 cm of PCB, and an 8Gb/s data rate through 4cm of PCB. Designed in a 1.2V, 90nm CMOS process, the ILRO achieves a wide tuning range from 1.6-2.6GHz. The total area of each receiver is 0.0174mmΒ², resulting in a measured power efficiency of 0.6mW/Gb/s.
Improving upon the first test chip, a second test chip for 8Gb/s forwarded clock serial link receivers exploits a low-power super-harmonic injection-locked ring oscillator for symmetric multi-phase local clock generation and deskewing. Further power reduction is achieved by designing most of the receiver circuits in the near-threshold region (0.6V supply), with the exception of only the global clock buffer, test buffers and synthesized digital test circuits at nominal 1V supply. At the architectural level, a 1:10 direct demultiplexing rate is chosen to achieve low supply operation by exploiting high-parallelism. Fabricated in 65nm CMOS technology, two receiver prototypes are integrated in this test chip, one without and the other with front-end boot-strapped S/Hs. Including the amortized power of global clock distribution, the proposed serial link receivers consume 1.3mW and 2mW respectively at 8Gb/s input data rate, achieving a power efficiency of 0.163mW/Gb/s and 0.25mW/Gb/s. Measurement results show both receivers achieve BER < 10β»ΒΉΒ² across a 20-cm FR4 PCB channel
- β¦