Search CORE

4 research outputs found

통계적 주파수 검출기 기반 기준 주파수를 사용하지 않는 클록 및 데이터 복원 회로의 설계 방법론

Author: 최홍석
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022. 8. 정덕균.In this thesis, a design of a high-speed, power-efficient, wide-range clock and data recovery (CDR) without a reference clock is proposed. A frequency acquisition scheme using a stochastic frequency detector (SFD) based on the Alexander phase detector (PD) is utilized for the referenceless operation. Pat-tern histogram analysis is presented to analyze the frequency acquisition behavior of the SFD and verified by simulation. Based on the information obtained by pattern histogram analysis, SFD using autocovariance is proposed. With a direct-proportional path and a digital integral path, the proposed referenceless CDR achieves frequency lock at all measurable conditions, and the measured frequency acquisition time is within 7μs. The prototype chip has been fabricated in a 40-nm CMOS process and occupies an active area of 0.032 mm2. The proposed referenceless CDR achieves the BER of less than 10-12 at 32 Gb/s and exhibits an energy efficiency of 1.15 pJ/b at 32 Gb/s with a 1.0 V supply.본 논문은 기준 클럭이 없는 고속, 저전력, 광대역으로 동작하는 클럭 및 데이터 복원회로의 설계를 제안한다. 기준 클럭이 없는 동작을 위해서 알렉산더 위상 검출기에 기반한 통계적 주파수 검출기를 사용하는 주파수 획득 방식이 사용된다. 통계적 주파수 검출기의 주파수 추적 양상을 분석하기 위해 패턴 히스토그램 분석 방법론을 제시하였고 시뮬레이션을 통해 검증하였다. 패턴 히스토그램 분석을 통해 얻은 정보를 바탕으로 자기공분산을 이용한 통계적 주파수 검출기를 제안한다. 직접 비례 경로와 디지털 적분 경로를 통해 제안된 기준 클럭이 없는 클럭 및 데이터 복원회로는 모든 측정 가능한 조건에서 주파수 잠금을 달성하는 데 성공하였고, 모든 경우에서 측정된 주파수 추적 시간은 7μs 이내이다. 40-nm CMOS 공정을 이용하여 만들어진 칩은 0.032 mm2의 면적을 차지한다. 제안하는 클럭 및 데이터 복원회로는 32 Gb/s의 속도에서 비트에러율 10-12 이하로 동작하였고, 에너지 효율은 32Gb/s의 속도에서 1.0V 공급전압을 사용하여 1.15 pJ/b을 달성하였다.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 13 CHAPTER 2 BACKGROUNDS 14 2.1 CLOCKING ARCHITECTURES IN SERIAL LINK INTERFACE 14 2.2 GENERAL CONSIDERATIONS FOR CLOCK AND DATA RECOVERY 24 2.2.1 OVERVIEW 24 2.2.2 JITTER 26 2.2.3 CDR JITTER CHARACTERISTICS 33 2.3 CDR ARCHITECTURES 39 2.3.1 PLL-BASED CDR – WITH EXTERNAL REFERENCE CLOCK 39 2.3.2 DLL/PI-BASED CDR 44 2.3.3 PLL-BASED CDR – WITHOUT EXTERNAL REFERENCE CLOCK 47 2.4 FREQUENCY ACQUISITION SCHEME 50 2.4.1 TYPICAL FREQUENCY DETECTORS 50 2.4.1.1 DIGITAL QUADRICORRELATOR FREQUENCY DETECTOR 50 2.4.1.2 ROTATIONAL FREQUENCY DETECTOR 54 2.4.2 PRIOR WORKS 56 CHAPTER 3 DESIGN OF THE REFERENCELESS CDR USING SFD 58 3.1 OVERVIEW 58 3.2 PROPOSED FREQUENCY DETECTOR 62 3.2.1 MOTIVATION 62 3.2.2 PATTERN HISTOGRAM ANALYSIS 68 3.2.3 INTRODUCTION OF AUTOCOVARIANCE TO STOCHASTIC FREQUENCY DETECTOR 75 3.3 CIRCUIT IMPLEMENTATION 83 3.3.1 IMPLEMENTATION OF THE PROPOSED REFERENCELESS CDR 83 3.3.2 CONTINUOUS-TIME LINEAR EQUALIZER (CTLE) 85 3.3.3 DIGITALLY-CONTROLLED OSCILLATOR (DCO) 87 3.4 MEASUREMENT RESULTS 89 CHAPTER 4 CONCLUSION 99 APPENDIX A DETAILED FREQUENCY ACQUISITION WAVEFORMS OF THE PROPOSED SFD 100 BIBLIOGRAPHY 108 초 록 122박

SNU Open Repository and Archive

A Reference-Less Single-Loop Half-Rate Binary CDR

Author: Ali Sheikholeslami
Hirotaka Tamura
Masaya Kibune
Mohammad Sadegh Jalali
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

Architectures and Circuits Leveraging Injection-Locked Oscillators for Ultra-Low Voltage Clock Synthesis and Reference-less Receivers for Dense Chip-to-Chip Communications

Author: Gangasani Gautam
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

High performance computing is critical for the needs of scientific discovery and economic competitiveness. An extreme-scale computing system at 1000x the performance of today’s petaflop machines will exhibit massive parallelism on multiple vertical fronts, from thousands of computational units on a single processor to thousands of processors in a single data center. To facilitate such a massively-parallel extreme-scale computing, a key challenge is power. The challenge is not power associated with base computation but rather the problem of transporting data from one chip to another at high enough rates. This thesis presents architectures and techniques to achieve low power and area footprint while achieving high data rates in a dense very-short reach (VSR) chip-to-chip (C2C) communication network. High-speed serial communication operating at ultra-low supplies improves the energy-efficiency and lowers the power envelop of a system doing an exaflop of loops. One focus area of this thesis is clock synthesis for such energy-efficient interconnect applications operating at high speeds and ultra-low supplies. A sub-integer clockfrequency synthesizer is presented that incorporates a multi-phase injection-locked ring-oscillator-based prescaler for operation at an ultra-low supply voltage of 0.5V, phase-switching based programmable division for sub-integer clock-frequency synthesis, and automatic calibration to ensure injection lock. A record speed of 9GHz has been demonstrated at 0.5V in 45nm SOI CMOS. It consumes 3.5mW of power at 9.12GHz and 0.052 of area, while showing an output phase noise of -100dBc/Hz at 1MHz offset and RMS jitter of 325fs; it achieves a net of -186.5 in a 45-nm SOI CMOS process. This thesis also describes a receiver with a reference-less clocking architecture for high-density VSR-C2C links. This architecture simplifies clock-tree planning in dense extreme-scaling computing environments and has high-bandwidth CDR to enable SSC for suppressing EMI and to mitigate TX jitter requirements. It features clock-less DFE and a high-bandwidth CDR based on master-slave ILOs for phase generation/rotation. The RX is implemented in 14nm CMOS and characterized at 19Gb/s. It is 1.5x faster that previous reference-less embedded-oscillator based designs with greater than 100MHz jitter tolerance bandwidth and recovers error-free data over VSR-C2C channels. It achieves a power-efficiency of 2.9pJ/b while recovering error-free data (BER 200MHz and the INL of the ILO-based phase-rotator (32- Steps/UI) is <1-LSB. Lastly, this thesis develops a time-domain delay-based modeling of injection locking to describe injection-locking phenomena in nonharmonic oscillators. The model is used to predict the locking bandwidth, and the locking dynamics of the locked oscillator. The model predictions are verified against simulations and measurements of a four-stage differential ring oscillator. The model is further used to predict the injection-locking behavior of a single-ended CMOS inverter based ring oscillator, the lock range of a multi-phase injection-locked ring-oscillator-based prescaler, as well as the dynamics of tracking injection phase perturbations in injection-locked masterslave oscillators; demonstrating its versatility in application to any nonharmonic oscillator

Columbia University Academic Commons

Design of High-Speed SerDes Transceiver for Chip-to-Chip Communications in CMOS Process

Author: Zheng Xuqiang
Publication venue
Publication date
Field of study

With the continuous increase of on-chip computation capacities and exponential growth of data-intensive applications, the high-speed data transmission through serial links has become the backbone for modern communication systems. To satisfy the massive data-exchanging requirement, the data rate of such serial links has been updated from several Gb/s to tens of Gb/s. Currently, the commercial standards such as Ethernet 400GbE, InfiniBand high data rate (HDR), and common electrical interface (CEI)-56G has been developing towards 40+ Gb/s. As the core component within these links, the transceiver chipset plays a fundamental role in balancing the operation speed, power consumption, area occupation, and operation range. Meanwhile, the CMOS process has become the dominant technology in modern transceiver chip fabrications due to its large-scale digital integration capability and aggressive pricing advantage. This research aims to explore advanced techniques that are capable of exploiting the maximum operation speed of the CMOS process, and hence provides potential solutions for 40+ Gb/s CMOS transceiver designs. The major contributions are summarized as follows. A low jitter ring-oscillator-based injection-locked clock multiplier (RILCM) with a hybrid frequency tracking loop that consists of a traditional phase-locked loop (PLL), a timing-adjusted loop, and a loop selection state-machine is implemented in 65-nm C-MOS process. In the ring voltage-controlled oscillator, a full-swing pseudo-differential delay cell is proposed to lower the device noise to phase noise conversion. To obtain high operation speed and high detection accuracy, a compact timing-adjusted phase detector tightly combined with a well-matched charge pump is designed. Meanwhile, a lock-loss detection and lock recovery is devised to endow the RILCM with a similar lock-acquisition ability as conventional PLL, thus excluding the initial frequency set- I up aid and preventing the potential lock-loss risk. The experimental results show that the figure-of-merit of the designed RILCM reaches -247.3 dB, which is better than previous RILCMs and even comparable to the large-area LC-ILCMs. The transmitter (TX) and receiver (RX) chips are separately designed and fab- ricated in 65-nm CMOS process. The transmitter chip employs a quarter-rate multi-multiplexer (MUX)-based 4-tap feed-forward equalizer (FFE) to pre-distort the output. To increase the maximum operating speed, a bandwidth-enhanced 4:1 MUX with the capability of eliminating charge-sharing effect is proposed. To produce the quarter-rate parallel data streams with appropriate delays, a compact latch array associated with an interleaved-retiming technique is designed. The receiver chip employs a two-stage continuous-time linear equalizer (CTLE) as the analog front-end and integrates an improved clock data recovery to extract the sampling clocks and retime the incoming data. To automatically balance the jitter tracking and jitter suppression, passive low-pass filters with adaptively-adjusted bandwidth are introduced into the data-sampling path. To optimize the linearity of the phase interpolation, a time-averaging-based compensating phase interpolator is proposed. For equalization, a combined TX-FFE and RX-CTLE is applied to compensate for the channel loss, where a low-cost edge-data correlation-based sign zero-forcing adaptation algorithm is proposed to automatically adjust the TX-FFE’s tap weights. Measurement results show that the fabricated transmitter/receiver chipset can deliver 40 Gb/s random data at a bit error rate of 16 dB loss at the half-baud frequency, while consuming a total power of 370 mW

University of Lincoln Institutional Repository