4 research outputs found
ํต๊ณ์ ์ฃผํ์ ๊ฒ์ถ๊ธฐ ๊ธฐ๋ฐ ๊ธฐ์ค ์ฃผํ์๋ฅผ ์ฌ์ฉํ์ง ์๋ ํด๋ก ๋ฐ ๋ฐ์ดํฐ ๋ณต์ ํ๋ก์ ์ค๊ณ ๋ฐฉ๋ฒ๋ก
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ ๋ณด๊ณตํ๋ถ, 2022. 8. ์ ๋๊ท .In this thesis, a design of a high-speed, power-efficient, wide-range clock and data recovery (CDR) without a reference clock is proposed. A frequency acquisition scheme using a stochastic frequency detector (SFD) based on the Alexander phase detector (PD) is utilized for the referenceless operation. Pat-tern histogram analysis is presented to analyze the frequency acquisition behavior of the SFD and verified by simulation. Based on the information obtained by pattern histogram analysis, SFD using autocovariance is proposed. With a direct-proportional path and a digital integral path, the proposed referenceless CDR achieves frequency lock at all measurable conditions, and the measured frequency acquisition time is within 7ฮผs. The prototype chip has been fabricated in a 40-nm CMOS process and occupies an active area of 0.032 mm2. The proposed referenceless CDR achieves the BER of less than 10-12 at 32 Gb/s and exhibits an energy efficiency of 1.15 pJ/b at 32 Gb/s with a 1.0 V supply.๋ณธ ๋
ผ๋ฌธ์ ๊ธฐ์ค ํด๋ญ์ด ์๋ ๊ณ ์, ์ ์ ๋ ฅ, ๊ด๋์ญ์ผ๋ก ๋์ํ๋ ํด๋ญ ๋ฐ ๋ฐ์ดํฐ ๋ณต์ํ๋ก์ ์ค๊ณ๋ฅผ ์ ์ํ๋ค. ๊ธฐ์ค ํด๋ญ์ด ์๋ ๋์์ ์ํด์ ์๋ ์ฐ๋ ์์ ๊ฒ์ถ๊ธฐ์ ๊ธฐ๋ฐํ ํต๊ณ์ ์ฃผํ์ ๊ฒ์ถ๊ธฐ๋ฅผ ์ฌ์ฉํ๋ ์ฃผํ์ ํ๋ ๋ฐฉ์์ด ์ฌ์ฉ๋๋ค. ํต๊ณ์ ์ฃผํ์ ๊ฒ์ถ๊ธฐ์ ์ฃผํ์ ์ถ์ ์์์ ๋ถ์ํ๊ธฐ ์ํด ํจํด ํ์คํ ๊ทธ๋จ ๋ถ์ ๋ฐฉ๋ฒ๋ก ์ ์ ์ํ์๊ณ ์๋ฎฌ๋ ์ด์
์ ํตํด ๊ฒ์ฆํ์๋ค. ํจํด ํ์คํ ๊ทธ๋จ ๋ถ์์ ํตํด ์ป์ ์ ๋ณด๋ฅผ ๋ฐํ์ผ๋ก ์๊ธฐ๊ณต๋ถ์ฐ์ ์ด์ฉํ ํต๊ณ์ ์ฃผํ์ ๊ฒ์ถ๊ธฐ๋ฅผ ์ ์ํ๋ค. ์ง์ ๋น๋ก ๊ฒฝ๋ก์ ๋์งํธ ์ ๋ถ ๊ฒฝ๋ก๋ฅผ ํตํด ์ ์๋ ๊ธฐ์ค ํด๋ญ์ด ์๋ ํด๋ญ ๋ฐ ๋ฐ์ดํฐ ๋ณต์ํ๋ก๋ ๋ชจ๋ ์ธก์ ๊ฐ๋ฅํ ์กฐ๊ฑด์์ ์ฃผํ์ ์ ๊ธ์ ๋ฌ์ฑํ๋ ๋ฐ ์ฑ๊ณตํ์๊ณ , ๋ชจ๋ ๊ฒฝ์ฐ์์ ์ธก์ ๋ ์ฃผํ์ ์ถ์ ์๊ฐ์ 7ฮผs ์ด๋ด์ด๋ค. 40-nm CMOS ๊ณต์ ์ ์ด์ฉํ์ฌ ๋ง๋ค์ด์ง ์นฉ์ 0.032 mm2์ ๋ฉด์ ์ ์ฐจ์งํ๋ค. ์ ์ํ๋ ํด๋ญ ๋ฐ ๋ฐ์ดํฐ ๋ณต์ํ๋ก๋ 32 Gb/s์ ์๋์์ ๋นํธ์๋ฌ์จ 10-12 ์ดํ๋ก ๋์ํ์๊ณ , ์๋์ง ํจ์จ์ 32Gb/s์ ์๋์์ 1.0V ๊ณต๊ธ์ ์์ ์ฌ์ฉํ์ฌ 1.15 pJ/b์ ๋ฌ์ฑํ์๋ค.CHAPTER 1 INTRODUCTION 1
1.1 MOTIVATION 1
1.2 THESIS ORGANIZATION 13
CHAPTER 2 BACKGROUNDS 14
2.1 CLOCKING ARCHITECTURES IN SERIAL LINK INTERFACE 14
2.2 GENERAL CONSIDERATIONS FOR CLOCK AND DATA RECOVERY 24
2.2.1 OVERVIEW 24
2.2.2 JITTER 26
2.2.3 CDR JITTER CHARACTERISTICS 33
2.3 CDR ARCHITECTURES 39
2.3.1 PLL-BASED CDR โ WITH EXTERNAL REFERENCE CLOCK 39
2.3.2 DLL/PI-BASED CDR 44
2.3.3 PLL-BASED CDR โ WITHOUT EXTERNAL REFERENCE CLOCK 47
2.4 FREQUENCY ACQUISITION SCHEME 50
2.4.1 TYPICAL FREQUENCY DETECTORS 50
2.4.1.1 DIGITAL QUADRICORRELATOR FREQUENCY DETECTOR 50
2.4.1.2 ROTATIONAL FREQUENCY DETECTOR 54
2.4.2 PRIOR WORKS 56
CHAPTER 3 DESIGN OF THE REFERENCELESS CDR USING SFD 58
3.1 OVERVIEW 58
3.2 PROPOSED FREQUENCY DETECTOR 62
3.2.1 MOTIVATION 62
3.2.2 PATTERN HISTOGRAM ANALYSIS 68
3.2.3 INTRODUCTION OF AUTOCOVARIANCE TO STOCHASTIC FREQUENCY DETECTOR 75
3.3 CIRCUIT IMPLEMENTATION 83
3.3.1 IMPLEMENTATION OF THE PROPOSED REFERENCELESS CDR 83
3.3.2 CONTINUOUS-TIME LINEAR EQUALIZER (CTLE) 85
3.3.3 DIGITALLY-CONTROLLED OSCILLATOR (DCO) 87
3.4 MEASUREMENT RESULTS 89
CHAPTER 4 CONCLUSION 99
APPENDIX A DETAILED FREQUENCY ACQUISITION WAVEFORMS OF THE PROPOSED SFD 100
BIBLIOGRAPHY 108
์ด ๋ก 122๋ฐ
Recommended from our members
Architectures and Circuits Leveraging Injection-Locked Oscillators for Ultra-Low Voltage Clock Synthesis and Reference-less Receivers for Dense Chip-to-Chip Communications
High performance computing is critical for the needs of scientific discovery and economic competitiveness. An extreme-scale computing system at 1000x the performance of todayโs petaflop machines will exhibit massive parallelism on multiple vertical fronts, from thousands of computational units on a single processor to thousands of processors in a single data center. To facilitate such a massively-parallel extreme-scale computing, a key challenge is power. The challenge is not power associated with base computation but rather the problem of transporting data from one chip to another at high enough rates. This thesis presents architectures and techniques to achieve low power and area footprint while achieving high data rates in a dense very-short reach (VSR) chip-to-chip (C2C) communication network. High-speed serial communication operating at ultra-low supplies improves the energy-efficiency and lowers the power envelop of a system doing an exaflop of loops. One focus area of this thesis is clock synthesis for such energy-efficient interconnect applications operating at high speeds and ultra-low supplies. A sub-integer clockfrequency synthesizer is presented that incorporates a multi-phase injection-locked ring-oscillator-based prescaler for operation at an ultra-low supply voltage of 0.5V, phase-switching based programmable division for sub-integer clock-frequency synthesis, and automatic calibration to ensure injection lock. A record speed of 9GHz has been demonstrated at 0.5V in 45nm SOI CMOS. It consumes 3.5mW of power at 9.12GHz and 0.052 of area, while showing an output phase noise of -100dBc/Hz at 1MHz offset and RMS jitter of 325fs; it achieves a net of -186.5 in a 45-nm SOI CMOS process. This thesis also describes a receiver with a reference-less clocking architecture for high-density VSR-C2C links. This architecture simplifies clock-tree planning in dense extreme-scaling computing environments and has high-bandwidth CDR to enable SSC for suppressing EMI and to mitigate TX jitter requirements. It features clock-less DFE and a high-bandwidth CDR based on master-slave ILOs for phase generation/rotation. The RX is implemented in 14nm CMOS and characterized at 19Gb/s. It is 1.5x faster that previous reference-less embedded-oscillator based designs with greater than 100MHz jitter tolerance bandwidth and recovers error-free data over VSR-C2C channels. It achieves a power-efficiency of 2.9pJ/b while recovering error-free data (BER 200MHz and the INL of the ILO-based phase-rotator (32- Steps/UI) is <1-LSB. Lastly, this thesis develops a time-domain delay-based modeling of injection locking to describe injection-locking phenomena in nonharmonic oscillators. The model is used to predict the locking bandwidth, and the locking dynamics of the locked oscillator. The model predictions are verified against simulations and measurements of a four-stage differential ring oscillator. The model is further used to predict the injection-locking behavior of a single-ended CMOS inverter based ring oscillator, the lock range of a multi-phase injection-locked ring-oscillator-based prescaler, as well as the dynamics of tracking injection phase perturbations in injection-locked masterslave oscillators; demonstrating its versatility in application to any nonharmonic oscillator
Design of High-Speed SerDes Transceiver for Chip-to-Chip Communications in CMOS Process
With the continuous increase of on-chip computation capacities and exponential growth of data-intensive applications, the high-speed data transmission through serial links has become the backbone for modern communication systems. To satisfy the massive data-exchanging requirement, the data rate of such serial links has been updated from several Gb/s to tens of Gb/s. Currently, the commercial standards such as Ethernet 400GbE, InfiniBand high data rate (HDR), and common electrical interface (CEI)-56G has been developing towards 40+ Gb/s. As the core component within these links, the transceiver chipset plays a fundamental role in balancing the operation speed, power consumption, area occupation, and operation range. Meanwhile, the CMOS process has become the dominant technology in modern transceiver chip fabrications due to its large-scale digital integration capability and aggressive pricing advantage. This research aims to explore advanced techniques that are capable of exploiting the maximum operation speed of the CMOS process, and hence provides potential solutions for 40+ Gb/s CMOS transceiver designs. The major contributions are summarized as follows.
A low jitter ring-oscillator-based injection-locked clock multiplier (RILCM) with a hybrid frequency tracking loop that consists of a traditional phase-locked loop (PLL), a timing-adjusted loop, and a loop selection state-machine is implemented in 65-nm C-MOS process. In the ring voltage-controlled oscillator, a full-swing pseudo-differential delay cell is proposed to lower the device noise to phase noise conversion. To obtain high operation speed and high detection accuracy, a compact timing-adjusted phase detector tightly combined with a well-matched charge pump is designed. Meanwhile, a lock-loss detection and lock recovery is devised to endow the RILCM with a similar lock-acquisition ability as conventional PLL, thus excluding the initial frequency set- I up aid and preventing the potential lock-loss risk. The experimental results show that the figure-of-merit of the designed RILCM reaches -247.3 dB, which is better than previous RILCMs and even comparable to the large-area LC-ILCMs.
The transmitter (TX) and receiver (RX) chips are separately designed and fab- ricated in 65-nm CMOS process. The transmitter chip employs a quarter-rate multi-multiplexer (MUX)-based 4-tap feed-forward equalizer (FFE) to pre-distort the output. To increase the maximum operating speed, a bandwidth-enhanced 4:1 MUX with the capability of eliminating charge-sharing effect is proposed. To produce the quarter-rate parallel data streams with appropriate delays, a compact latch array associated with an interleaved-retiming technique is designed. The receiver chip employs a two-stage continuous-time linear equalizer (CTLE) as the analog front-end and integrates an improved clock data recovery to extract the sampling clocks and retime the incoming data. To automatically balance the jitter tracking and jitter suppression, passive low-pass filters with adaptively-adjusted bandwidth are introduced into the data-sampling path. To optimize the linearity of the phase interpolation, a time-averaging-based compensating phase interpolator is proposed. For equalization, a combined TX-FFE and RX-CTLE is applied to compensate for the channel loss, where a low-cost edge-data correlation-based sign zero-forcing adaptation algorithm is proposed to automatically adjust the TX-FFEโs tap weights. Measurement results show that the fabricated transmitter/receiver chipset can deliver 40 Gb/s random data at a bit error rate of 16 dB loss at the half-baud frequency, while consuming a total power of 370 mW