148 research outputs found

    A Bang-Bang All-Digital PLL for Frequency Synthesis

    Get PDF
    abstract: Phase locked loops are an integral part of any electronic system that requires a clock signal and find use in a broad range of applications such as clock and data recovery circuits for high speed serial I/O and frequency synthesizers for RF transceivers and ADCs. Traditionally, PLLs have been primarily analog in nature and since the development of the charge pump PLL, they have almost exclusively been analog. Recently, however, much research has been focused on ADPLLs because of their scalability, flexibility and higher noise immunity. This research investigates some of the latest all-digital PLL architectures and discusses the qualities and tradeoffs of each. A highly flexible and scalable all-digital PLL based frequency synthesizer is implemented in 180 nm CMOS process. This implementation makes use of a binary phase detector, also commonly called a bang-bang phase detector, which has potential of use in high-speed, sub-micron processes due to the simplicity of the phase detector which can be implemented with a simple D flip flop. Due to the nonlinearity introduced by the phase detector, there are certain performance limitations. This architecture incorporates a separate frequency control loop which can alleviate some of these limitations, such as lock range and acquisition time.Dissertation/ThesisM.S. Electrical Engineering 201

    A high speed serializer/deserializer design

    Get PDF
    A Serializer/Deserializer (SerDes) is a circuit that converts parallel data into a serial stream and vice versa. It helps solve clock/data skew problems, simplifies data transmission, lowers the power consumption and reduces the chip cost. The goal of this project was to solve the challenges in high speed SerDes design, which included the low jitter design, wide bandwidth design and low power design. A quarter-rate multiplexer/demultiplexer (MUX/DEMUX) was implemented. This quarter-rate structure decreases the required clock frequency from one half to one quarter of the data rate. It is shown that this significantly relaxes the design of the VCO at high speed and achieves lower power consumption. A novel multi-phase LC-ring oscillator was developed to supply a low noise clock to the SerDes. This proposed VCO combined an LC-tank with a ring structure to achieve both wide tuning range (11%) and low phase noise (-110dBc/Hz at 1MHz offset). With this structure, a data rate of 36 Gb/s was realized with a measured peak-to-peak jitter of 10ps using 0.18microm SiGe BiCMOS technology. The power consumption is 3.6W with 3.4V power supply voltage. At a 60 Gb/s data rate the simulated peak-to-peak jitter was 4.8ps using 65nm CMOS technology. The power consumption is 92mW with 2V power supply voltage. A time-to-digital (TDC) calibration circuit was designed to compensate for the phase mismatches among the multiple phases of the PLL clock using a three dimensional fully depleted silicon on insulator (3D FDSOI) CMOS process. The 3D process separated the analog PLL portion from the digital calibration portion into different tiers. This eliminated the noise coupling through the common substrate in the 2D process. Mismatches caused by the vertical tier-to-tier interconnections and the temperature influence in the 3D process were attenuated by the proposed calibration circuit. The design strategy and circuits developed from this dissertation provide significant benefit to both wired and wireless applications

    ๋ฉ”๋ชจ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐ ๋ ˆ๋ฒจ ๋‹จ์ผ ์ข…๋‹จ ์†ก์‹ ๊ธฐ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ๊น€์ˆ˜ํ™˜.๋ณธ ์—ฐ๊ตฌ์—์„œ ๋ฉ”๋ชจ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐ ๋ ˆ๋ฒจ ์†ก์‹ ๊ธฐ๊ฐ€ ์ œ์‹œ๋˜์—ˆ๋‹ค. ํ”„๋กœ์„ธ์„œ์™€ ๋ฉ”๋ชจ๋ฆฌ ๊ฐ„์˜ ์„ฑ๋Šฅ ์ฐจ์ด๊ฐ€ ๋งค๋…„ ๊ณ„์† ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ, ๋ฉ”๋ชจ๋ฆฌ๋Š” ์ „์ฒด ์‹œ์Šคํ…œ์˜ ๋ณ‘๋ชฉ์ ์ด ๋˜๊ณ ์žˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์„ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด PAM-4 ๋‹จ์ผ ์ข…๋‹จ ์†ก์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•˜์˜€๊ณ , ๋ฉ€ํ‹ฐ ๋žญํฌ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์œ„ํ•œ duobinary ๋‹จ์ผ ์ข…๋‹จ ์†ก์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ PAM-4 ์†ก์‹ ๊ธฐ์˜ ๋“œ๋ผ์ด๋ฒ„๋Š” ๋†’์€ ์„ ํ˜•์„ฑ๊ณผ ์ž„ํ”ผ๋˜์Šค ์ •ํ•ฉ์„ ๋™์‹œ์— ๋งŒ์กฑํ•œ๋‹ค. ๋˜ํ•œ ์ €ํ•ญ์ด๋‚˜ ์ธ๋•ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•„ ์ž‘์€ ๋ฉด์ ์„ ์ฐจ์ง€ํ•œ๋‹ค. ์ œ์•ˆ๋œ ZQ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜์€ ์„ธ๊ฐœ์˜ ๊ต์ • ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด ์†ก์‹ ๊ธฐ๊ฐ€ ์ •ํ™•ํ•œ ์ž„ํ”ผ๋˜์Šค์™€ ์„ ํ˜•์ ์ธ ์ถœ๋ ฅ์„ ๊ฐ–๊ฒŒ ํ•œ๋‹ค. ํ”„๋กœํ†  ํƒ€์ž…์€ 65nm CMOS ๊ณต์ •์œผ๋กœ ์ œ์ž‘๋˜์—ˆ๊ณ  ์†ก์‹ ๊ธฐ๋Š” 0.0333mm2์˜ ๋ฉด์ ์„ ์ฐจ์ง€ํ•œ๋‹ค. ์ธก์ •๋œ 28Gb/s์—์„œ์˜ eye๋Š” 18.3ps์˜ ๊ธธ์ด์™€ 42.4mV์˜ ๋†’์ด๋ฅผ ๊ฐ–๊ณ , ์—๋„ˆ์ง€ ํšจ์œจ์€ 0.64pJ/bit์ด๋‹ค. ZQ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜๊ณผ ํ•จ๊ป˜ ์ธก์ •๋œ RLM์€ 0.993์ด๋‹ค. ๋ฉ”๋ชจ๋ฆฌ์˜ ์šฉ๋Ÿ‰์„ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด ํ•˜๋‚˜์˜ ํŒจํ‚ค์ง€์— ์—ฌ๋Ÿฌ ๊ฐœ์˜ DRAM ๋‹ค์ด๋ฅผ ์ˆ˜์ง์œผ๋กœ ์Œ“๋Š” ํŒจํ‚ค์ง•์€ ๋ฉ”๋ชจ๋ฆฌ์˜ ์ค‘์•™ ํŒจ๋“œ ๊ตฌ์กฐ์™€ ๊ฒฐํ•ฉ๋˜์–ด ์งง์€ ๋ฐ˜์‚ฌ๋ฅผ ์•ผ๊ธฐํ•˜๋Š” ์Šคํ…์„ ๋งŒ๋“ ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ธฐ์œ„ํ•ด ๋ฐ˜์‚ฌ ๊ธฐ๋ฐ˜ duobinary ์†ก์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค. ์ด ์†ก์‹ ๊ธฐ๋Š” ๋ฐ˜์‚ฌ๋ฅผ ์ด์šฉํ•˜์—ฌ duobinary signaling์„ ํ•œ๋‹ค. 2ํƒญ ๋ฐ˜๋Œ€ ๊ฐ•์กฐ ๊ธฐ์ˆ ๊ณผ ์Šฌ๋ฃจ ๋ ˆ์ดํŠธ ์กฐ์ ˆ ๊ธฐ์ˆ ์ด ์‹ ํ˜ธ ์™„๊ฒฐ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. NRZ eye๊ฐ€ ์—†๋Š” 10Gb/s์—์„œ ์ธก์ •๋œ duobinary eye๋Š” 63.6ps ๊ธธ์ด์™€ 70.8mV์˜ ๋†’์ด๋ฅผ ๊ฐ–๋Š”๋‹ค. ์ธก์ •๋œ ์—๋„ˆ์ง€ ํšจ์œจ์€ 1.38pJ/bit์ด๋‹ค.Multi-level transmitters for memory interfaces have been presented. The performance gap between processor and memory has been increased by 50% every year, making memory to be a bottle neck of the overall system. To increase memory bandwidth, we have proposed a PAM-4 single-ended transmitter. To compensate for the side effect of the multi-rank memory, we have proposed a reflection-based duobinary transmitter. The proposed PAM-4 transmitter has the driver, which simultaneously satisfies impedance matching and high linearity. The driver occupies a small area due to a resistorless and inductorless structure. The proposed ZQ calibration for PAM-4 has three calibration points, which allow the transmitter to have accurate impedance and linear output. The ZQ calibration considers impedance variation of both the driver and the receiver. A prototype has been fabricated in 65nm CMOS process, and the transmitter occupies 0.0333mm2. The measured eye has a width of 18.3ps and a height of 42.4mV at 28Gb/s, and the measured energy efficiency is 0.64pJ/b. The measured RLM with the 3-point ZQ calibration is 0.993. To increase memory density, the stacked die packaging with multiple DRAM die stacked vertically in one package is widely used. However, combined with the center-pad structure, the structure creates stubs that cause short reflections. We have proposed the reflection-based duobinary transmitter to mitigate this problem. The proposed transmitter uses reflection for duobinary signaling. The 2-tap opposite FFE and the slew-rate control are used to increase signal integrity. The measured duobinary eye at 10Gb/s has a width of 63.6ps and a height of 70.8mV while there is no NRZ eye opening. The measured energy efficiency is 1.38pJ/bit.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 8 CHAPTER 2 MUTI-LEVEL SIGNALING 9 2.1 PAM-4 SIGNALING 9 2.2 DESIGN CONSIDERATIONS FOR PAM-4 TRANSMITTER 16 2.2.1 LEVEL SEPARATION MISMATCH RATIO (RLM) 17 2.2.2 IMPEDANCE MATCHING 19 2.2.3 PRIOR ARTS 21 2.3 DUOBINARY SIGNALING 24 CHAPTER 3 HIGH-LINEARITY AND IMPEDANCE-MATCHED PAM-4 TRANSMITTER 30 3.1 OVERALL ARCHITECTURE 31 3.2 SINGLE-ENDED IMPEDANCE-MATCHED PAM-4 DRIVER 33 3.3 3-POINT ZQ CALIBRATION FOR PAM-4 47 CHAPTER 4 REFLECTION-BASED DUOBINARY TRANSMITTER 57 4.1 BIDIRECTIONAL DUAL-RANK MEMORY SYSTEM 58 4.2 CONCEPT OF REFLECTION-BASED DUOBINARY SIGNALING 66 4.3 REFLECTION-BASED DUOBINARY TRANSMITTER 70 4.3.1 OVERALL ARCHITECTURE 70 4.3.2 EQUALIZATION FOR REFLECTION-BASED DUOBINARY SIGNALING 72 4.3.3 2D BINARY-SEGMENTED DRIVER 75 CHAPTER 5 EXPERIMENTAL RESULTS 77 5.1 HIGH-LINEARITY AND IMPEDANCE-MATCHED PAM-4 TRANSMITTER 77 5.2 REFLECTION-BASED DUOBINARY TRANSMITTER 84 CHAPTER 6 92 CONCLUSION 92 BIBLIOGRAPHY 94Docto

    High Speed Reconfigurable NRZ/PAM4 Transceiver Design Techniques

    Get PDF
    While the majority of wireline standards use simple binary non-return-to-zero (NRZ) signaling, four-level pulse-amplitude modulation (PAM4) standards are emerging to increase bandwidth density. This dissertation proposes efficient implementations for high speed NRZ/PAM4 transceivers. The first prototype includes a dual-mode NRZ/PAM4 serial I/O transmitter which can support both modulations with minimum power and hardware overhead. A source-series-terminated (SST) transmitter achieves 1.2Vpp output swing and employs lookup table (LUT) control of a 31-segment output digital-to-analog converter (DAC) to implement 4/2-tap feed-forward equalization (FFE) in NRZ/PAM4 modes, respectively. Transmitter power is improved with low-overhead analog impedance control in the DAC cells and a quarter-rate serializer based on a tri-state inverter-based mux with dynamic pre-driver gates. The transmitter is designed to work with a receiver that implements an NRZ/PAM4 decision feedback equalizer (DFE) that employs 1 finite impulse response (FIR) and 2 infinite impulse response (IIR) taps for first post-cursor and long-tail ISI cancellation, respectively. Fabricated in GP 65-nm CMOS, the transmitter occupies 0.060mmยฒ area and achieves 16Gb/s NRZ and 32Gb/s PAM4 operation at 10.4 and 4.9 mW/Gb/s while operating over channels with 27.6 and 13.5dB loss at Nyquist, respectively. The second prototype presents a 56Gb/s four-level pulse amplitude modulation (PAM4) quarter-rate wireline receiver which is implemented in a 65nm CMOS process. The frontend utilize a single stage continuous time linear equalizer (CTLE) to boost the main cursor and relax the pre-cursor cancelation requirement, requiring only a 2-tap pre-cursor feed-forward equalization (FFE) on the transmitter side. A 2-tap decision feedback equalizer (DFE) with one finite impulse response (FIR) tap and one infinite impulse response (IIR) tap is employed to cancel first post-cursor and longtail inter-symbol interference (ISI). The FIR tap direct feedback is implemented inside the CML slicers to relax the critical timing of DFE and maximize the achievable data-rate. In addition to the per-slice main 3 data samplers, an error sampler is utilized for background threshold control and an edge-based sampler performs both PLL-based CDR phase detection and generates information for background DFE tap adaptation. The receiver consumes 4.63mW/Gb/s and compensates for up to 20.8dB loss when operated with a 2- tap FFE transmitter. The experimental results and comparison with state-of-the-art shows superior power efficiency of the presented prototypes for similar data-rate and channel loss. The usage of proposed design techniques are not limited to these specific prototypes and can be applied for any wireline transceiver with different modulation, data-rate and CMOS technology

    High Voltage and Nanoscale CMOS Integrated Circuits for Particle Physics and Quantum Computing

    Get PDF

    A LPDDR4 MEMORY CONTROLLER DESIGN WITH EYE CENTER DETECTION ALGORITHM

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2016. 2. ๊น€์ˆ˜ํ™˜.The demand for higher bandwidth with reduced power consumption in mobile memory is increasing. In this thesis, architecture of the LPDDR4 memory controller, operated with a LPDDR4 memory, is proposed and designed, and efficient training algorithm, which is appropriate for this architecture, is proposed for memory training and verification. The operation speed range of the LPDDR4 memory specification is from 533Mbps to 4266Mbps, and the LPDDR4 memory controller is designed to support that range of the LPDDR4 memory. The phase-locked loop in the LPDDR4 memory controller is designed to operate between 1333MHz and 2133MHz. To cover the range of the LPDDR4 memory, the selectable frequency divider is used to provide operation clock. The output frequency of the phase-locked loop with divider is from 266MHz to 2133MHz. The delay-locked loop in the LPDDR4 memory controller is designed to operate between 266MHz and 2133MHz with 180หš phase locking. The delay-locked loop is used each training operation, which is command training, data read and write training. To complete training in each training stage, eye center detection algorithm is used. The circuits for the proposed eye center detection algorithm such as delay line, phase interpolator and reference generator are designed and validated. The proposed 1x2y3x eye center detection algorithm is 23 times faster than conventional two-dimensional eye center detection algorithm and it can be implemented simply. Using 65nm CMOS process, the proposed LPDDR4 memory controller occupies 12mm2. The verification of the LPDDR4 memory controller is performed with commodity LPDDR4 memory. The verification of all training sequence, which is power on, initializing, boot up, command training, write leveling, read training, write training, is performed in this environment. The low voltage swing terminated logic driver and other several functions, including write leveling and data transmission, are verified at 4266Mbps and the entire LPDDR4 memory controller operations from 566Mbps to 1600Mbps are verified. The proposed eye center detection algorithm is verified from 566Mbps to 2843Mbps.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 INTRODUCTION 5 1.3 THESIS ORGANIZATION 7 CHAPTER 2 LPDDR4 MEMORY CONTROLLER DESIGN 8 2.1 DIFFERENCE BETWEEN LPDDR3 AND LPDDR4 MEMORY 8 2.1.1 ARCHITECTURAL DIFFERENCE BETWEEN LPDDR3 AND LPDDR4 MEMORY 10 2.1.2 SOURCE SYNCHRONOUS MATCHED SCHEME AND UNMATCHED SCHEME 11 2.1.3 LOW VOLTAGE SWING TERMINATED LOGIC DRIVER AND TERMINATION SCHEME 12 2.2 LPDDR4 MEMORY CONTROLLER SPECIFICATION 15 2.3 DESIGN PROCEDURE 18 CHAPTER 3 LPDDR4 MEMORY CONTROLLER ARCHITECTURE BASED ON MEMORY TRAINING 20 3.1 LPDDR4 MEMORY TRAINING SEQUENCE 20 3.2 LPDDR4 MEMORY TRAINING EYE DETECTION ALGORITHM 24 3.2.1 EYE CENTER DETECTION 24 3.2.2 1X2Y3X EYE CENTER DETECTION ALGORITHM 27 3.3. LPDDR4 MEMORY CONTROLLER DESIGN BASED ON MEMORY TRAINING 31 3.3.1 ARCHITECTURE FOR MEMORY BOOT UP AND POWER UP 31 3.3.2 CLOCK PATH ARCHITECTURE AND CLOCK TREE 34 3.3.3 COMMAND TRAINING AND COMMAND PATH ARCHITECTURE 35 3.3.4 WRITE LEVELING AND DATA STROBE TRANSMISSION PATH ARCHITECTURE 39 3.3.5 READ TRAINING AND READ PATH ARCHITECTURE 41 3.3.6 WRITE TRAINING AND WRITE PATH ARCHITECTURE 43 3.3.7 NORMAL READ/WRITE OPERATION AND MARGIN TEST 46 CHAPTER 4 LPDDR4 MEMORY CONTROLLER ARCHITECTURE MODELING AND CIRCUIT DESIGN 48 4.1 OVERALL LPDDR4 MEMORY CONTROLLER ARCHITECTURE MODELING 48 4.2 SIMULATION RESULT OF LPDDR4 MEMORY CONTROLLER MODELING 51 4.3 LPDDR4 MEMORY CONTROLLER CIRCUIT DESIGN 61 4.3.1 PHASE-LOCKED LOOP 61 4.3.2 DELAY-LOCKED LOOP 65 4.3.3 TRANSMITTER OF LPDDR4 MEMORY CONTROLLER: WRITE PATH 70 4.3.4 DE-SERIALIZER WITH CLOCK DOMAIN CROSSING 75 CHAPTER 5 MEASUREMENT RESULT OF LPDDR4 MEMORY CONTROLLER 77 5.1 LPDDR4 MEMORY CONTROLLER MEASUREMENT SETUP 77 5.1.1 LPDDR4 MEMORY CONTROLLER FLOOR PLAN AND LAYOUT 77 5.1.2 PACKAGE AND TEST BOARD 79 5.2 LPDDR4 MEMORY CONTROLLER SUB-BLOCK MEASUREMENT 81 5.2.1 PHASE-LOCKED LOOP 81 5.2.2 DELAY-LOCKED LOOP 83 5.2.3 200PS AND 800PS DELAY LINE 85 5.2.4 VOLTAGE REFERENCE GENERATOR 86 5.2.5 PHASE INTERPOLATOR 87 5.3 LPDDR4 MEMORY SYSTEM OPERATION MEASUREMENT 90 CHAPTER 6 CONCLUSION 93 APPENDIX OPERATION FLOW CHART OF THE PROPOSED LPDDR4 MEMORY CONTROLLER 95 BIBLIOGRAPHY 118 KOREAN ABSTRACT 124Docto

    Design of Low-Power NRZ/PAM-4 Wireline Transmitters

    Get PDF
    Rapid growing demand for instant multimedia access in a myriad of digital devices has pushed the need for higher bandwidth in modern communication hardwares ranging from short-reach (SR) memory/storage interfaces to long-reach (LR) data center Ethernets. At the same time, comprehensive design optimization of link system that meets the energy-efficiency is required for mobile computing and low operational cost at datacenters. This doctoral study consists of design of two low-swing wireline transmitters featuring a low-power clock distribution and 2-tap equalization in energy-efficient manners up to 20-Gb/s operation. In spite of the reduced signaling power in the voltage-mode (VM) transmit driver, the presence of the segment selection logic still diminishes the power saving benefit. The first work presents a scalable VM transmitter which offers low static power dissipation and adopts an impedance-modulated 2-tap equalizer with analog tap control, thereby obviating driver segmentation and reducing pre-driver complexity and dynamic power. Per-channel quadrature clock generation with injection-locked oscillators (ILO) allows the generation of rail-to-rail quadrature clocks. Energy efficiency is further improved with capacitively driven low-swing global clock distribution and supply scaling at lower data rates, while output eye quality is maintained at low voltages with automatic phase calibration of the local ILO-generated quarter-rate clocks. A prototype fabricated in a general purpose 65 nm CMOS process includes a 2 mm global clock distribution network and two transmitters that support an output swing range of 100-300mV with up to 12-dB of equalization. The transmitters achieve 8-16 Gb/s operation at 0.65-1.05 pJ/b energy efficiency. The second work involves a dual-mode NRZ/PAM-4 differential low-swing voltage-mode (VM) transmitter. The pulse-selected output multiplexing allows reduction of power supply and deterministic jitter caused by large on-chip parasitic inherent in the transmission-gate-based multiplexers in the earlier work. Analog impedance control replica circuits running in the background produce gate-biasing voltages that control the peaking ratio for 2-tap feed-forward equalization and PAM-4 symbol levels for high-linearity. This analog control also allows for efficient generation of the middle levels in PAM-4 operation with good linearity quantified by level separation mismatch ratio of 95%. In NRZ mode, 2-tap feedforward equalization is configurable in high-performance controlled-impedance or energy-efficient impedance-modulated settings to provide performance scalability. Analytic design consideration on dynamic power, data-rate, mismatch, and output swing brings optimal performance metric on the given technology node. The proof-of-concept prototype is verified on silicon with 65 nm CMOS process with improved performance in speed and energy-efficiency owing to double-stack NMOS transistors in the output stage. The transmitter consumes as low as 29.6mW in 20-Gb/s NRZ and 25.5mW in the 28-Gb/s PAM-4 operations

    Design Techniques for Energy Efficient Multi-GB/S Serial I/O Transceivers

    Get PDF
    Total I/O bandwidth demand is growing in high-performance systems due to the emergence of many-core microprocessors and in mobile devices to support the next generation of multi-media features. High-speed serial I/O energy efficiency must improve in order to enable continued scaling of these parallel computing platforms in applications ranging from data centers to smart mobile devices. The first work, a low-power forwarded-clock I/O transceiver architecture is presented that employs a high degree of output/input multiplexing, supply-voltage scaling with data rate, and low-voltage circuit techniques to enable low-power operation. The transmitter utilizes a 4:1 output multiplexing voltage-mode driver along with 4-phase clocking that is efficiently generated from a passive poly-phase filter. The output driver voltage swing is accurately controlled from 100-200 mV_(ppd) using a low-voltage pseudo-differential regulator that employs a partial negative-resistance load for improved low frequency gain. 1:8 input de-multiplexing is performed at the receiver equalizer output with 8 parallel input samplers clocked from an 8-phase injection-locked oscillator that provides more than 1UI de-skew range. Low-power high-speed serial I/O transmitters which include equalization to compensate for channel frequency dependent loss are required to meet the aggressive link energy efficiency targets of future systems. The second work presents a low power serial link transmitter design that utilizes an output stage which combines a voltage-mode driver, which offers low static-power dissipation, and current-mode equalization, which offers low complexity and dynamic-power dissipation. The utilization of current-mode equalization decouples the equalization settings and termination impedance, allowing for a significant reduction in pre-driver complexity relative to segmented voltage-mode drivers. Proper transmitter series termination is set with an impedance control loop which adjusts the on-resistance of the output transistors in the driver voltage-mode portion. Further reductions in dynamic power dissipation are achieved through scaling the serializer and local clock distribution supply with data rate. Finally, it presents that a scalable quarter-rate transmitter employs an analog-controlled impedance-modulated 2-tap voltage-mode equalizer and achieves fast power-state transitioning with a replica-biased regulator and ILO clock generation. Capacitively-driven 2 mm global clock distribution and automatic phase calibration allows for aggressive supply scaling

    ์˜คํ”„์…‹ ์ œ๊ฑฐ๊ธฐ์˜ ์ ์‘ ์ œ์–ด ๋“ฑํ™”๊ธฐ์™€ ๋ณด์šฐ-๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋ฅผ ํ™œ์šฉํ•œ ์ˆ˜์‹ ๊ธฐ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021.8. ์—ผ์ œ์™„.In this thesis, designs of high-speed, low-power wireline receivers (RX) are explained. To be specific, the circuit techniques of DC offset cancellation, merged-summer DFE, stochastic Baud-rate CDR, and the phase detector (PD) for multi-level signal are proposed. At first, an RX with adaptive offset cancellation (AOC) and merged summer decision-feedback equalizer (DFE) is proposed. The proposed AOC engine removes the random DC offset of the data path by examining the random data stream's sampled data and edge outputs. In addition, the proposed RX incorporates a shared-summer DFE in a half-rate structure to reduce power dissipation and hardware complexity of the adaptive equalizer. A prototype chip fabricated in 40 nm CMOS technology occupies an active area of 0.083 mm2. Thanks to the AOC engine, the proposed RX achieves the BER of less than 10-12 in a wide range of data rates: 1.62-10 Gb/s. The proposed RX consumes 18.6 mW at 10 Gb/s over a channel with a 27 dB loss at 5 GHz, exhibiting a figure-of-merit of 0.068 pJ/b/dB. Secondly, a 40 nm CMOS RX with Baud-rate phase-detector (BRPD) is proposed. The RX includes two PDs: the BRPD employing the stochastic technique and the BRPD suitable for multi-level signals. Thanks to the Baud-rate CDRโ€™s advantage, by not using an edge-sampling clock, the proposed CDR can reduce the power consumption by lowering the hardware complexity. Besides, the proposed stochastic phase detector (SPD) tracks an optimal phase-locking point that maximizes the vertical eye opening. Furthermore, despite residual inter-symbol interference, proposed BRPD for multi-level signal secures vertical eye margin, which is especially vulnerable in the multi-level signal. Besides, the proposed BRPD has a unique lock point with an adaptive DFE, unlike conventional Mueller-Muller PD. A prototype chip fabricated in 40 nm CMOS technology occupies an active area of 0.24 mm2. The proposed PAM-4 RX achieves the bit-error-rate less than 10-11 in 48 Gb/s and the power efficiency of 2.42 pJ/b.๋ณธ ๋…ผ๋ฌธ์€ ๊ณ ์†, ์ €์ „๋ ฅ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ์œ ์„  ์ˆ˜์‹ ๊ธฐ์˜ ์„ค๊ณ„์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜๊ณ  ์žˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ๋งํ•˜๋ฉด, ์˜คํ”„์…‹ ์ƒ์‡„, ๋ณ‘ํ•ฉ๋œ ์„œ๋จธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ๋“ฑํ™”๊ธฐ ๊ธฐ์ˆ , ํ™•๋ฅ ์  ๋ณด์šฐ ๋ ˆ์ดํŠธ ํด๋Ÿญ๊ณผ ๋ฐ์ดํ„ฐ ๋ณต์›๊ธฐ, ๊ทธ๋ฆฌ๊ณ  ๋‹ค์ค‘ ๋ ˆ๋ฒจ ์‹ ํ˜ธ์— ์ ํ•ฉํ•œ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ์งธ๋กœ, ์ ์‘ ์˜คํ”„์…‹ ์ œ๊ฑฐ ๋ฐ ๋ณ‘ํ•ฉ๋œ ์„œ๋จธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ๋“ฑํ™”๊ธฐ๋ฅผ ๊ฐ–์ถ˜ ์ˆ˜์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ์ ์‘ ์˜คํ”„์…‹ ์ œ๊ฑฐ ์—”์ง„์€ ์ž„์˜์˜ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ์˜ ์ƒ˜ํ”Œ๋ง ๋ฐ์ดํ„ฐ, ์—์ง€ ์ถœ๋ ฅ์„ ๊ฒ€์‚ฌํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ ์ƒ์˜ ์˜คํ”„์…‹์„ ์ œ๊ฑฐํ•œ๋‹ค. ๋˜ํ•œ ํ•˜ํ”„ ๋ ˆ์ดํŠธ ๊ตฌ์กฐ์˜ ๋ณ‘ํ•ฉ๋œ ์„œ๋จธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ๋“ฑํ™”๊ธฐ๋Š” ์ „๋ ฅ์˜ ์‚ฌ์šฉ๊ณผ ํ•˜๋“œ์›จ์–ด์˜ ๋ณต์žก์„ฑ์„ ์ค„์ธ๋‹ค. 40 nm CMOS ๊ธฐ์ˆ ๋กœ ์ œ์ž‘๋œ ํ”„๋กœํ† ํƒ€์ž… ์นฉ์€ 0.083 mm2 ์˜ ๋ฉด์ ์„ ๊ฐ€์ง„๋‹ค. ์ ์‘ ์˜คํ”„์…‹ ์ œ๊ฑฐ๊ธฐ ๋•๋ถ„์— ์ œ์•ˆ๋œ ์ˆ˜์‹ ๊ธฐ๋Š” 10-12 ๋ฏธ๋งŒ์˜ BER์„ ๋‹ฌ์„ฑํ•œ๋‹ค. ๋˜ํ•œ ์ œ์•ˆ๋œ ์ˆ˜์‹ ๊ธฐ๋Š” 5GHz์—์„œ 27 dB์˜ ๋กœ์Šค๋ฅผ ๊ฐ–๋Š” ์ฑ„๋„์—์„œ 10 Gb/s์˜ ์†๋„์—์„œ 18.6 mW๋ฅผ ์†Œ๋น„ํ•˜๋ฉฐ 0.068 pJ/b/dB์˜ FoM์„ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ๋‘๋ฒˆ์งธ๋กœ, ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๊ฐ€ ์žˆ๋Š” 40 nm CMOS ์ˆ˜์‹ ๊ธฐ๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ˆ˜์‹ ๊ธฐ์—๋Š” ๋‘๊ฐœ์˜ ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋ฅผ ํฌํ•จํ•œ๋‹ค. ํ•˜๋‚˜๋Š” ํ™•๋ฅ ๋ก ์  ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋Š” ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ์ด๋‹ค. ๋ณด์šฐ ๋ ˆ์ดํŠธ ํด๋Ÿญ ๋ฐ์ดํ„ฐ ๋ณต์›๊ธฐ์˜ ์žฅ์  ๋•๋ถ„์— ์—์ง€ ์ƒ˜ํ”Œ๋ง ํด๋Ÿญ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ์œผ๋กœ์„œ ํŒŒ์›Œ์˜ ์†Œ๋ชจ์™€ ํ•˜๋“œ์›จ์–ด์˜ ๋ณต์žก์„ฑ์„ ์ค„์˜€๋‹ค. ๋˜ํ•œ ํ™•๋ฅ ์  ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋Š” ์ˆ˜์ง ์•„์ด ์˜คํ”„๋‹์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ตœ์ ์˜ ์œ„์ƒ ์ง€์ ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‹ค๋ฅธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋Š” ๋‹ค์ค‘ ๋ ˆ๋ฒจ ์‹ ํ˜ธ์— ์ ํ•ฉํ•œ ๋ฐฉ์‹์ด๋‹ค. ์‹ฌ๋ณผ ๊ฐ„ ๊ฐ„์„ญ์ด ๋‹ค์ค‘ ๋ ˆ๋ฒจ ์‹ ํ˜ธ์— ๋งค์šฐ ์ทจ์•ฝํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ๋”๋ผ๋„ ์ œ์•ˆ๋œ ๋‹ค์ค‘ ๋ ˆ๋ฒจ ์‹ ํ˜ธ์šฉ ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋Š” ์ˆ˜์ง ์•„์ด ๋งˆ์ง„์„ ํ™•๋ณดํ•œ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ์ œ์•ˆ๋œ ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋Š” ๊ธฐ์กด์˜ ๋ฎฌ๋Ÿฌ-๋ฎ๋Ÿฌ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ์™€ ๋‹ฌ๋ฆฌ ์ ์‘ํ˜• ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ๋“ฑํ™”๊ธฐ๊ฐ€ ์žˆ๋”๋ผ๋„ ์œ ์ผํ•œ ๋ฝ ์ง€์ ์„ ๊ฐ–๋Š”๋‹ค. ํ”„๋กœํ† ํƒ€์ž… ์นฉ์€ 0.24mm2์˜ ๋ฉด์ ์„ ๊ฐ€์ง„๋‹ค. ์ œ์•ˆ๋œ PAM-4 ์ˆ˜์‹ ๊ธฐ๋Š” 48 Gb/s์˜ ์†๋„์—์„œ 10-11 ๋ฏธ๋งŒ์˜ BER์„ ๊ฐ€์ง€๊ณ , 2.42 pJ/b์˜ FoM์„ ๊ฐ€์ง„๋‹ค.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 5 CHAPTER 2 BACKGROUNDS 6 2.1 BASIC ARCHITECTURE IN SERIAL LINK 6 2.1.1 SERIAL COMMUNICATION 6 2.1.2 CLOCK AND DATA RECOVERY 8 2.1.3 MULTI-LEVEL PULSE-AMPLITUDE MODULATION 10 2.2 EQUALIZER 12 2.2.1 EQUALIZER OVERVIEW 12 2.2.2 DECISION-FEEDBACK EQUALIZER 15 2.2.3 ADAPTIVE EQUALIZER 18 2.3 CLOCK RECOVERY 21 2.3.1 2X OVERSAMPLING PD ALEXANDER PD 22 2.3.2 BAUD-RATE PD MUELLER MULLER PD 25 CHAPTER 3 AN ADAPTIVE OFFSET CANCELLATION SCHEME AND SHARED SUMMER ADAPTIVE DFE 28 3.1 OVERVIEW 28 3.2 AN ADAPTIVE OFFSET CANCELLATION SCHEME AND SHARED-SUMMER ADAPTIVE DFE FOR LOW POWER RECEIVER 31 3.3 SHARED SUMMER DFE 37 3.4 RECEIVER IMPLEMENTATION 42 3.5 MEASUREMENT RESULTS 45 CHAPTER 4 PAM-4 BAUD-RATE DIGITAL CDR 51 4.1 OVERVIEW 51 4.2 OVERALL ARCHITECTURE 53 4.2.1 PROPOSED BAUD-RATE CDR ARCHITECTURE 53 4.2.2 PROPOSED ANALOG FRONT-END STRUCTURE 59 4.3 STOCHASTIC PHASE DETECTION PAM-4 CDR 64 4.3.1 PROPOSED STOCHASTIC PHASE DETECTION 64 4.3.2 COMPARISON OF THE STOCHASTIC PD WITH SS-MMPD 70 4.4 PHASE DETECTION FOR MULTI-LEVEL SIGNALING 73 4.4.1 PROPOSED BAUD-RATE PHASE DETECTOR FOR MULTI-LEVEL SIGNAL 73 4.4.2 DATA LEVEL AND DFE COEFFICIENT ADAPTATION 79 4.4.3 PROPOSED PHASE DETECTOR 84 4.5 MEASUREMENT RESULT 88 4.5.1 MEASUREMENT OF THE PROPOSED STOCHASTIC BAUD-RATE PHASE DETECTION 94 4.5.2 MEASUREMENT OF THE PROPOSED BAUD-RATE PHASE DETECTION FOR MULTI-LEVEL SIGNAL 97 CHAPTER 5 CONCLUSION 103 BIBLIOGRAPHY 105 ์ดˆ ๋ก 109๋ฐ•
    • โ€ฆ
    corecore