Search CORE

5 research outputs found

A 0.8V, 560fJ/bit, 14Gb/s injection-locked receiver with input duty-cycle distortion tolerable edge-rotating 5/4X sub-rate CDR in 65nm CMOS

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

차세대 HBM 용 고집적, 저전력 송수신기 설계

Author: 고한곤
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 정덕균.This thesis presents design techniques for high-density power-efficient transceiver for the next-generation high bandwidth memory (HBM). Unlike the other memory interfaces, HBM uses a 3D-stacked package using through-silicon via (TSV) and a silicon interposer. The transceiver for HBM should be able to solve the problems caused by the 3D-stacked package and TSV. At first, a data (DQ) receiver for HBM with a self-tracking loop that tracks a phase skew between DQ and data strobe (DQS) due to a voltage or thermal drift is proposed. The self-tracking loop achieves low power and small area by uti-lizing an analog-assisted baud-rate phase detector. The proposed pulse-to-charge (PC) phase detector (PD) converts the phase skew to a voltage differ-ence and detects the phase skew from the voltage difference. An offset calibra-tion scheme that can compensates for a mismatch of the PD is also proposed. The proposed calibration scheme operates without any additional sensing cir-cuits by taking advantage of the write training of HBM. Fabricated in 65 nm CMOS, the DQ receiver shows a power efficiency of 370 fJ/b at 4.8 Gb/s and occupies 0.0056 mm2. The experimental results show that the DQ receiver op-erates without any performance degradation under a ± 10% supply variation. In a second prototype IC, a high-density transceiver for HBM with a feed-forward-equalizer (FFE)-combined crosstalk (XT) cancellation scheme is pre-sented. To compensate for the XT, the transmitter pre-distorts the amplitude of the FFE output according to the XT. Since the proposed XT cancellation (XTC) scheme reuses the FFE implemented to equalize the channel loss, additional circuits for the XTC is minimized. Thanks to the XTC scheme, a channel pitch can be significantly reduced, allowing for the high channel density. Moreover, the 3D-staggered channel structure removes the ground layer between the verti-cally adjacent channels, which further reduces a cross-sectional area of the channel per lane. The test chip including 6 data lanes is fabricated in 65 nm CMOS technology. The 6-mm channels are implemented on chip to emulate the silicon interposer between the HBM and the processor. The operation of the XTC scheme is verified by simultaneously transmitting 4-Gb/s data to the 6 consecutive channels with 0.5-um pitch and the XTC scheme reduces the XT-induced jitter up to 78 %. The measurement result shows that the transceiver achieves the throughput of 8 Gb/s/um. The transceiver occupies 0.05 mm2 for 6 lanes and consumes 36.6 mW at 6 x 4 Gb/s.본 논문에서는 차세대 HBM을 위한 고집적 저전력 송수신기 설계 방법을 제안한다. 첫 번째로, 전압 및 온도 변화에 의한 데이터와 클럭 간 위상 차이를 보상할 수 있는 자체 추적 루프를 가진 데이터 수신기를 제안한다. 제안하는 자체 추적 루프는 데이터 전송 속도와 같은 속도로 동작하는 위상 검출기를 사용하여 전력 소모와 면적을 줄였다. 또한 메모리의 쓰기 훈련 (write training) 과정을 이용하여 효과적으로 위상 검출기의 오프셋을 보상할 수 있는 방법을 제안한다. 제안하는 데이터 수신기는 65 nm 공정으로 제작되어 4.8 Gb/s에서 370 fJ/b을 소모하였다. 또한 10 % 의 전압 변화에 대하여 안정적으로 동작하는 것을 확인하였다. 두 번째로, 피드 포워드 이퀄라이저와 결합된 크로스 토크 보상 방식을 활용한 고집적 송수신기를 제안한다. 제안하는 송신기는 크로스 토크 크기에 해당하는 만큼 송신기 출력을 왜곡하여 크로스 토크를 보상한다. 제안하는 크로스 토크 보상 방식은 채널 손실을 보상하기 위해 구현된 피드 포워드 이퀄라이저를 재활용함으로써 추가적인 회로를 최소화한다. 제안하는 송수신기는 크로스 토크가 보상 가능하기 때문에, 채널 간격을 크게 줄여 고집적 통신을 구현하였다. 또한 집적도를 더 증가시키기 위해 세로로 인접한 채널 사이의 차폐 층을 제거한 적층 채널 구조를 제안한다. 6개의 송수신기를 포함한 프로토타입 칩은 65 nm 공정으로 제작되었다. HBM과 프로세서 사이의 silicon interposer channel 을 모사하기 위한 6 mm 의 채널이 칩 위에 구현되었다. 제안하는 크로스 토크 보상 방식은 0.5 um 간격의 6개의 인접한 채널에 동시에 데이터를 전송하여 검증되었으며, 크로스 토크로 인한 지터를 최대 78 % 감소시켰다. 제안하는 송수신기는 8 Gb/s/um 의 처리량을 가지며 6 개의 송수신기가 총 36.6 mW의 전력을 소모하였다.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 4 CHAPTER 2 BACKGROUND ON HIGH-BANDWIDTH MEMORY 6 2.1 OVERVIEW 6 2.2 TRANSCEIVER ARCHITECTURE 10 2.3 READ/WRITE OPERATION 15 2.3.1 READ OPERATION 15 2.3.2 WRITE OPERATION 19 CHAPTER 3 BACKGROUNDS ON COUPLED WIRES 21 3.1 GENERALIZED MODEL 21 3.2 EFFECT OF CROSSTALK 26 CHAPTER 4 DQ RECEIVER WITH BAUD-RATE SELF-TRACKING LOOP 29 4.1 OVERVIEW 29 4.2 FEATURES OF DQ RECEIVER FOR HBM 33 4.3 PROPOSED PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.1 OPERATION OF PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.2 OFFSET CALIBRATION 37 4.3.3 OPERATION SEQUENCE 39 4.4 CIRCUIT IMPLEMENTATION 42 4.5 MEASUREMENT RESULT 46 CHAPTER 5 HIGH-DENSITY TRANSCEIVER FOR HBM WITH 3D-STAGGERED CHANNEL AND CROSSTALK CANCELLATION SCHEME 57 5.1 OVERVIEW 57 5.2 PROPOSED 3D-STAGGERED CHANNEL 61 5.2.1 IMPLEMENTATION OF 3D-STAGGERED CHANNEL 61 5.2.2 CHANNEL CHARACTERISTICS AND MODELING 66 5.3 PROPOSED FEED-FORWARD-EQUALIZER-COMBINED CROSSTALK CANCELLATION SCHEME 72 5.4 CIRCUIT IMPLEMENTATION 77 5.4.1 OVERALL ARCHITECTURE 77 5.4.2 TRANSMITTER WITH FFE-COMBINED XTC 79 5.4.3 RECEIVER 81 5.5 MEASUREMENT RESULT 82 CHAPTER 6 CONCLUSION 93 BIBLIOGRAPHY 95 초 록 102Docto

SNU Open Repository and Archive

Design Techniques for High Pin Efficiency Wireline Transceivers

Author: Fan Yang-Hang
Publication venue
Publication date: 17/12/2020
Field of study

While the majority of wireline research investigates bandwidth improvement and how to overcome the high channel loss, pin efficiency is also critical in high-performance wireline applications. This dissertation proposes two different implementations for high pin efficiency wireline transceivers. The first prototype achieves twice pin efficiency than unidirectional signaling, which is 32Gb/s simultaneous bidirectional transceiver supporting transmission and reception on the same channel at the same time. It includes an efficient low-swing voltage-mode driver with an R-gm hybrid for signal separation, combining the continuous-time-linear-equalizer (CTLE) and echo cancellation (EC) in a single stage, and employing a low-complexity 5/4X CDA system. Support of a wide range of channels is possible with foreground adaptation of the EC finite impulse response (FIR) filter taps with a sign-sign least-mean-square (SSLMS) algorithm. Fabricated in TSMC 28-nm CMOS, the 32Gb/s SBD transceiver occupies

0.09 mm^{2}

area and achieves 16Gb/s uni-directional and 32Gb/s simultaneous bi-directional signals. 32Gb/s SBD operation consumes 1.83mW/Gb/s with 10.8dB channel loss at Nyquist rate. The second prototype presents an optical transmitter with a quantum-dot (QD) microring laser. This can support wavelength-division multiplexing allowing for high pin efficiency application by packing multiple high-bandwidth signals onto one optical channel. The development QD microring laser model accurately captures the intrinsic photonic high-speed dynamics and allows for the future co-design of the circuits and photonic device. To achieve higher bandwidth than intrinsic one, utilizing both techniques of optical injection locking (OIL) and 2-tap asymmetric Feed-forward equalizer (FFE) can perform 22Gb/s operation with 3.2mW/Gb/s. The first hybrid-integration directly-modulated OIL QD microring laser system is demonstrated

Texas A&M Repository

고속 DRAM 인터페이스를 위한 전압 및 온도에 둔감한 클록 패스와 위상 오류 교정기 설계

Author: 신소영
Publication venue: 서울대학교 대학원
Publication date: 01/02/2021
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2021. 2. 정덕균.To cope with problems caused by the high-speed operation of the dynamic random access memory (DRAM) interface, several approaches are proposed that are focused on the clock path of the DRAM. Two delay-locked loop (DLL) based schemes, a forwarded-clock (FC) receiver (RX) with self-tracking loop and a quadrature error corrector, are proposed. Moreover, an open-loop based scheme is presented for drift compensation in the clock distribution. The open-loop scheme consumes less power consumption and reduces design complexity. The FC RX uses DLLs to compensate for voltage and temperature (VT) drift in unmatched memory interfaces. The self-tracking loop consists of two-stage cascaded DLLs to operate in a DRAM environment. With the write training and the proposed DLL, the timing relationship between the data and the sampling clock is always optimal. The proposed scheme compensates for delay drift without relying on data transitions or re-training. The proposed FC RX is fabricated in 65-nm CMOS process and has an active area containing 4 data lanes of 0.0329 mm2. After the write training is completed at the supply voltage of 1 V, the measured timing margin remains larger than 0.31-unit interval (UI) when the supply voltage drifts in the range of 0.94 V and 1.06 V from the training voltage, 1 V. At the data rate of 6.4 Gb/s, the proposed FC RX achieves an energy efficiency of 0.45 pJ/bit. Contrary to the aforementioned scheme, an open-loop-based voltage drift compensation method is proposed to minimize power consumption and occupied area. The overall clock distribution is composed of a current mode logic (CML) path and a CMOS path. In the proposed scheme, the architecture of the CML-to-CMOS converter (C2C) and the inverter is changed to compensate for supply voltage drift. The bias generator provides bias voltages to the C2C and inverters according to supply voltage for delay adjustment. The proposed clock tree is fabricated in 40 nm CMOS process and the active area is 0.004 mm2. When the supply voltage is modulated by a sinusoidal wave with 1 MHz, 100 mV peak-to-peak swing from the center of 1.1 V, applying the proposed scheme reduces the measured root-mean-square (RMS) jitter from 3.77 psRMS to 1.61 psRMS. At 6 GHz output clock, the power consumption of the proposed scheme is 11.02 mW. A DLL-based quadrature error corrector (QEC) with a wide correction range is proposed for the DRAM whose clocks are distributed over several millimeters. The quadrature error is corrected by adjusting delay lines using information from the phase error detector. The proposed error correction method minimizes increased jitter due to phase error correction by setting at least one of the delay lines in the quadrature clock path to the minimum delay. In addition, the asynchronous calibration on-off scheme reduces power consumption after calibration is complete. The proposed QEC is fabricated in 40 nm CMOS process and has an active area of 0.048 mm2. The proposed QEC exhibits a wide correctable error range of 101.6 ps and the remaining phase errors are less than 2.18° from 0.8 GHz to 2.3 GHz clock. At 2.3 GHz, the QEC contributes 0.53 psRMS jitter. Also, at 2.3 GHz, the power consumption is reduced from 8.89 mW to 3.39 mW when the calibration is off.본 논문에서는 동적 랜덤 액세스 메모리 (DRAM)의 속도가 증가함에 따라 클록 패스에서 발생할 수 있는 문제에 대처하기 위한 세 가지 회로들을 제안하였다. 제안한 회로들 중 두 방식들은 지연동기루프 (delay-locked loop) 방식을 사용하였고 나머지 한 방식은 면적과 전력 소모를 줄이기 위해 오픈 루프 방식을 사용하였다. DRAM의 비정합 수신기 구조에서 데이터 패스와 클록 패스 간의 지연 불일치로 인해 전압 및 온도 변화에 따라 셋업 타임 및 홀드 타임이 줄어드는 문제를 해결하기 위해 지연동기루프를 사용하였다. 제안한 지연동기루프 회로는 DRAM 환경에서 동작하도록 두 개의 지연동기루프로 나누었다. 또한 초기 쓰기 훈련을 통해 데이터와 클록을 타이밍 마진 관점에서 최적의 위치에 둘 수 있다. 따라서 제안하는 방식은 데이터 천이 정보가 필요하지 않다. 65-nm CMOS 공정을 이용하여 만들어진 칩은 6.4 Gb/s에서 0.45 pJ/bit의 에너지 효율을 가진다. 또한 1 V에서 쓰기 훈련 및 지연동기루프를 고정시키고 0.94 V에서 1.06 V까지 공급 전압이 바뀌었을 때 타이밍 마진은 0.31 UI보다 큰 값을 유지하였다. 다음으로 제안하는 회로는 클록 분포 트리에서 전압 변화로 인해 클록 패스의 지연이 달라지는 것을 앞서 제시한 방식과 달리 오픈 루프 방식으로 보상하였다. 기존 클록 패스의 인버터와 CML-to-CMOS 변환기의 구조를 변경하여 바이어스 생성 회로에서 생성한 공급 전압에 따라 바뀌는 바이어스 전압을 가지고 지연을 조절할 수 있게 하였다. 40-nm CMOS 공정을 이용하여 만들어진 칩의 6 GHz 클록에서의 전력 소모는 11.02 mW로 측정되었다. 1.1 V 중심으로 1 MHz, 100 mV 피크 투 피크를 가지는 사인파 성분으로 공급 전압을 변조하였을 때 제안한 방식에서의 지터는 기존 방식의 3.77 psRMS에서 1.61 psRMS로 줄어들었다. DRAM의 송신기 구조에서 다중 위상 클록 간의 위상 오차는 송신된 데이터의 데이터 유효 창을 감소시킨다. 이를 해결하기 위해 지연동기루프를 도입하게 되면 증가된 지연으로 인해 위상이 교정된 클록에서 지터가 증가한다. 본 논문에서는 증가된 지터를 최소화하기 위해 위상 교정으로 인해 증가된 지연을 최소화하는 위상 교정 회로를 제시하였다. 또한 유휴 상태에서 전력 소모를 줄이기 위해 위상 오차를 교정하는 회로를 입력 클록과 비동기식으로 끌 수 있는 방법 또한 제안하였다. 40-nm CMOS 공정을 이용하여 만들어진 칩의 위상 교정 범위는 101.6 ps이고 0.8 GHz 부터 2.3 GHz까지의 동작 주파수 범위에서 위상 교정기의 출력 클록의 위상 오차는 2.18°보다 작다. 제안하는 위상 교정 회로로 인해 추가된 지터는 2.3 GHz에서 0.53 psRMS이고 교정 회로를 껐을 때 전력 소모는 교정 회로가 켜졌을 때인 8.89 mW에서 3.39 mW로 줄어들었다.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Thesis Organization 4 Chapter 2 Background on DRAM Interface 5 2.1 Overview 5 2.2 Memory Interface 7 Chapter 3 Background on DLL 11 3.1 Overview 11 3.2 Building Blocks 15 3.2.1 Delay Line 15 3.2.2 Phase Detector 17 3.2.3 Charge Pump 19 3.2.4 Loop filter 20 Chapter 4 Forwarded-Clock Receiver with DLL-based Self-tracking Loop for Unmatched Memory Interfaces 21 4.1 Overview 21 4.2 Proposed Separated DLL 25 4.2.1 Operation of the Proposed Separated DLL 27 4.2.2 Operation of the Digital Loop Filter in DLL 31 4.3 Circuit Implementation 33 4.4 Measurement Results 37 4.4.1 Measurement Setup and Sequence 38 4.4.2 VT Drift Measurement and Simulation 40 Chapter 5 Open-loop-based Voltage Drift Compensation in Clock Distribution 46 5.1 Overview 46 5.2 Prior Works 50 5.3 Voltage Drift Compensation Method 52 5.4 Circuit Implementation 57 5.5 Measurement Results 61 Chapter 6 Quadrature Error Corrector with Minimum Total Delay Tracking 68 6.1 Overview 68 6.2 Prior Works 70 6.3 Quadrature Error Correction Method 73 6.4 Circuit Implementation 82 6.5 Measurement Results 88 Chapter 7 Conclusion 96 Bibliography 98 초록 102Docto

SNU Open Repository and Archive

저전력, 저면적 유선 송수신기 설계를 위한 회로 기술

Author: 배우람
Publication venue: 서울대학교 대학원
Publication date: 01/08/2016
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 정덕균.In this thesis, novel circuit techniques for low-power and area-efficient wireline transceiver, including a phase-locked loop (PLL) based on a two-stage ring oscillator, a scalable voltage-mode transmitter, and a forwarded-clock (FC) receiver based on a delay-locked-loop (DLL) based per-pin deskew, are proposed. At first, a two-stage ring PLL that provides a four-phase, high-speed clock for a quarter-rate TX in order to minimize power consumption is presented. Several analyses and verification techniques, ranging from the clocking architectures for a high-speed TX to oscillation failures in a two-stage ring oscillator, are addressed in this thesis. A tri-state-inverter–based frequency-divider and an AC-coupled clock-buffer are used for high-speed operations with minimal power and area overheads. The proposed PLL fabricated in the 65-nm CMOS technology occupies an active area of 0.009 mm2 with an integrated-RMS-jitter of 414 fs from 10 kHz to 100 MHz while consuming 7.6 mW from a 1.2-V supply at 10 GHz. The resulting figure-of-merit is -238.8 dB, which surpasses that of the state-of-the-art ring-PLLs by 4 dB. Secondly, a voltage-mode (VM) transmitter which offers a wide operation range of 6 to 32 Gb/s, controllable pre-emphasis equalization and output voltage swing without altering output impedance, and a power supply scalability is presented. A quarter-rate clocking architecture is employed in order to maximize the scalability and energy efficiency across the variety of operating conditions. A P-over-N VM driver is used for CMOS compatibility and wide voltage-swing range required for various I/O standards. Two supply regulators calibrate the output impedance of the VM driver across the wide swing and pre-emphasis range. A single phase-locked loop is used to provide a wide frequency range of 1.5-to-8 GHz. The prototype chip is fabricated in 65-nm CMOS technology and occupies active area of 0.48x0.36 mm2. The proposed transmitter achieves 250-to-600-mV single-ended swing and exhibits the energy efficiency of 2.10-to-2.93 pJ/bit across the data rate of 6-to-32 Gb/s. And last, this thesis describes a power and area-efficient FC receiver and includes an analysis of the jitter tolerance of the FC receiver. In the proposed design, jitter tolerance is maximized according to the analysis by employing a DLL-based de-skewing. A sample-swapping bang-bang phase-detector (SS-BBPD) eliminates the stuck locking caused by the finite delay range of the voltage-controlled delay line (VCDL), and also reduces the required delay range of the VCDL by half. The proposed FC receiver is fabricated in 65-nm CMOS technology and occupies an active area of 0.025 mm2. At a data rate of 12.5 Gb/s, the proposed FC receiver exhibits an energy efficiency of 0.36 pJ/bit, and tolerates 1.4-UIpp sinusoidal jitter of 300 MHz.Chapter 1. Introduction 1 1.1. Motivation 1 1.2. Thesis organization 5 Chapter 2. Phase-Locked Loop Based on Two-Stage Ring Oscillator 7 2.1. Overivew 7 2.2. Background and Analysis of a Two-stage Ring Oscillator 11 2.3. Circuit Implementation of The Proposed PLL 25 2.4. Measurement Results 33 Chapter 3. A Scalable Voltage-Mode Transmitter 37 3.1. Overview 37 3.2. Design Considerations on a Scalable Serial Link Transmitter 40 3.3. Circuit Implementation 46 3.4. Measurement Results 56 Chapter 4. Delay-Locked Loop Based Forwarded-Clock Receiver 62 4.1. Overview 62 4.2. Timing and Data Recovery in a Serial Link 65 4.3. DLL-Based Forwarded-Clock Receiver Characteristics 70 4.4. Circuit Implementation 79 4.5. Measurement Results 89 Chapter 5. Conclusion 94 Appendix 96 Appendix A. Design flow to optimize a high-speed ring oscillator 96 Appendix B. Reflection Issues in N-over-N Voltage-Mode Driver 99 Appendix C. Analysis on output swing and power consumption of the P-over-N voltage-mode driver 107 Appendix D. Loop Dynamics of DLL 112 Bibliography 121 Abstract 128Docto

SNU Open Repository and Archive