Search CORE

14 research outputs found

적응형 눈 감지 방법을 포함한 저전력 메모리 컨트롤러의 설계

Author: 김민오
Publication venue: 서울대학교 대학원
Publication date: 01/08/2017
Field of study

학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 김수환.and the read margin was enhanced from 0.30UI and 76mV without AF-CTLE to 0.47UI and 80mV to with AF-CTLE. The power efficiency during burst write and read were 5.68pJ/bit and 1.83pJ/bit respectively.A 4266Mb/s/pin LPDDR4 memory controller with an asynchronous feedback continuous-time linear equalizer and an adaptive 3-step eye detection algorithm is presented. The asynchronous feedback continuous-time linear equalizer removes the glitch of DQS without training by applying an offset larger than the noise, and improves read margin by operating as a decision feedback equalizer in DQ path. The adaptive 3-step eye detection algorithm reduces power consumption and black-out time in initialization sequence and retraining in comparison to the 2-dimensional full scanning. In addition, the adaptive 3-step eye detection algorithm can maintain the accuracy by sequentially searching the eye boundaries and initializing the resolution using the binary search method when the eye detection result changes. To achieve high bandwidth, a transmitter and receiver suitable for training are proposed. The transmitter consists of a phase interpolator, a digitally-controlled delay line, a 16:1 serializer, a pre-driver and low-voltage swing terminated logic. The receiver consists of a reference voltage generator, a continuous-time linear equalizer, a phase interpolator, a digitally-controlled delay line, a 1:4 deserializer, and a 4:16 deserializer. The clocking architecture is also designed for low power consumption in idle periods, which are commonly lengthy in mobile applications. A prototype chip was implemented in a 65nm CMOS process with ball grid array package and tested with commodity LPDDR4. The write margin was 0.36UI and 148mVCHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 5 CHAPTER 2 LPDDR4 6 2.1 COMPARISON BETWEEN LPDDR3 AND LPDDR4 6 2.2 SOURCE SYNCHRONOUS CLOCKING SCHEME 9 2.3 SIGNALING STANDARDS 11 2.4 MULTIPLE TRAININGS 14 2.5 RE-TRAINING AND RE-INITIALIZATION 16 CHAPTER 3 ADAPTIVE EYE DETECTION 18 3.1 EYE DETECTION 18 3.2 1X2Y3X EYE DETECTION 20 3.3 ADAPTIVE GAIN CONTROL 22 3.4 ADAPTIVE 1X2Y3X EYE DETECTION 24 CHAPTER 4 LPDDR4 MEMORY CONTROLLER 26 4.1 DESIGN PROCEDURE 26 4.2 ARCHITECTURE 30 4.2.1 TRANSMITTER 33 4.2.2 RECEIVER 35 4.2.3 CLOCKING ARCHITECTURE 38 4.3 CIRCUIT IMPLEMENTATION 43 4.3.1 ADPLL WITH MULTI-MODULUS DIVIDER 43 4.3.2 ADDLL WITH TRIANGULAR-MODULATED PI 45 4.3.3 CTLE WITH AUTO-DQS CLEANING 47 4.3.4 DES WITH CLOCK DOMAIN CROSSING 52 4.3.5 LVSTL WITH ZQ CALIBRATION 54 4.3.6 COARSE-FINE DCDL 56 4.4 LINK TRAINING 57 4.4.1 SIMULATION RESULTS 59 CHAPTER 5 MEASUREMENT RESULTS 72 5.1 MEASUREMENT SETUP 72 5.2 MEASUREMENT RESULTS OF SUB-BLOCK 80 5.2.1 ADPLL WITH MULTI-MODULUS DIVIDER 80 5.2.2 ADDLL WITH TRIANGULAR-MODULATED PI 82 5.2.3 COARSE-FINE DCDL 84 5.3 LPDDR4 INTERFACE MEASUREMENT RESULTS 84 CHAPTER 6 CONCLUSION 88 BIBLIOGRAPHY 90Docto

SNU Open Repository and Archive

메모리 인터페이스를 위한 4 레벨 펄스 진폭 변조 쿼터 레이트 수신기 설계

Author: 박현규
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022. 8. 김수환.본 연구에서는 메모리 인터페이스를 위한 4 레벨 펄스 진폭 변조 (PAM-4) 수신기와 직교 클록을 생성하는 직교 신호 보정기를 제안된다. 데이터 센터에서 증가하는 IP 트래픽은 고속 및 저전력 메모리 인터페이스에 대한 수요를 증가시켜왔다. 이러한 요구를 만족시키기 위해 클럭 및 나이퀴스트 주파수를 높이지 않고도 데이터 전송률을 높일 수 있는 PAM-4 신호가 주목을 받고 있다. PAM-4 신호는 제로 비 복귀 신호 (NRZ) 보다 3배 낮은 수직 마진을 가지며, 이는 결정 피드백 이퀄라이저 내 슬라이스의 클럭-큐 딜레이를 증가시키며, 이로 인해 PAM-4 결정 피드백 이퀄라이저의 성능을 제한하는 요인이다. 본 연구에서는 인버터 기반의 합산기를 이용, 선택적으로 신호를 증폭시키는 결정 피드백 이퀄라이저를 사용함으로써 슬라이서의 전력 소모를 증가시키지 않으면서 슬라이서의 클럭-큐 딜레이를 줄일 수 있다. 또한, 적응형 지연 이득 컨트롤러를 포함하는 직교 신호 보정기는 높은 정확도와 빠른 스큐 보정으로 쿼드러처 클럭 간의 스큐를 교정할 수 있다. 선택적 눈 증폭 결정 피드백 이퀄라이저와 적응형 지연 이득 컨트롤러를 포함하는 직교 신호 보정기의 성능을 검증하기 위해 프로토타입 칩을 제작하였다. 제작된 칩은 65 nm CMOS 공정으로 제작되었다. 프로토타입 칩은 24 Gb/s/pin 에서 10-12 의 비트 에러율을 100 mUI 의 신호 너비로 달성하였다. 프로토타입 칩 내 PAM-4 수신기는 0.73 pJ/b 의 에너지 효율을 갖는다. 또한 적응형 지연 이득 컨트롤러를 포함하는 직교 신호 보정기는 3 GHz 쿼드러처 클럭 간 최대 21.2 ps 의 스큐를 0.8 ps 까지 줄일 수 있으며, 이 때 76.9 ns 의 교정 시간을 갖는다. 제안하는 직교 신호 보정기는 3 GHz 에서 2.15 mW/GHz 의 전력 효율을 갖는다.A four-level pulse amplitude modulation (PAM-4) receiver, and a quadrature signal corrector (QSC) that generates quadrature clocks for memory interfaces is presented. Increasing IP traffic in data centers has increased the demand for high-speed and low-power memory interfaces. To satisfy this demand, PAM-4 signaling, which can increase data-rate without increasing clock and Nyquist frequency, is received considerable attention. PAM- signaling has vertical which three times lower than non-return-to-zero (NRZ) signaling, which makes the clock-to-Q delay of the slicer in the decision feedback equalizer (DFE) increases. This makes the DFE difficult to satisfy the timing constraint. In this paper, by using a DFE with inverter-based summers, the clock-to-Q delay of the slicer can be reduced without increasing the power consumption of the slicers. Also, the QSC using an adaptive delay gain controller can correct the skew between the quadrature clock with low skew and short correction time. The prototype receiver including the DFE with the inverter-based summer and the QSC using the adaptive delay gain controller was fabricated in 65 nm CMOS process. The prototype chip can achieve a bit error rate (BER) of 10-12 at 24 Gb/s/pin, and at this time, an eye width of 100 mUI is secured. The efficiency of the receiver is 0.73 pJ/b. In addition, the QSC cna reduce the maximum 21.2 ps of skew between 3 GHz quadrature clocks to 0.8 ps and has a correction time of 76.9 ns. The efficiency of the QSC is 2.15 mW/GHz.ABSTRACT 1 CONTENTS 3 LIST OF FIGURES 5 LIST OF TABLE 9 CHAPTER 1 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 PAM-4 SIGNALING 7 1.2.1 DESIGN CONSIDERATIONS ON PAM-4 RECEIVER 10 1.2.2 PRIOR WORKS 14 1.3 QUARTER-RATE ARCHITECTURE 18 1.3.1 DESIGN CONSIDERATION ON QUARTER-RATE ARCHITECTURE 20 1.3.2 PRIOR WORKS 25 1.4 SUMMARY 28 1.5 THESIS ORGANIZATION 30 CHAPTER 2 31 CONCEPTS OF DFE WITH INVERTER-BASED SUMMER 31 2.1 CONCEPTUAL ARCHITECTURE OF DFE WITH INVERTER-BASED SUMMER 32 2.2 DESIGN CONSIDERATION OF INVERTER-BASED SUMMER 37 CHAPTER 3 41 CONCEPTS OF QUADRATURE SIGNAL CORRECTOR USING ADAPTIVE DELAY GAIN CONTROLLER 41 3.1 OPERATION OF PROPOSED QUADRATURE SIGNAL CORRECTOR 42 3.2 LOOP FILTER INCLUDING ADAPTIVE DELAY GAIN CONTROLLER 45 CHAPTER 4 48 ARCHITECTURE AND IMPLEMENTATION 48 4.1 OVERALL ARCHITECTURE 49 4.2 ANALOG FRONT END 52 4.3 DECISION FEEDBACK EQUALIZER WITH INVERTER-BASED SUMMER 54 4.4 CLOCK PATH 62 4.5 QUADRATURE SIGNAL CORRECTOR WITH ADAPTIVE DELAY GAIN CONTROLLER 63 CHAPTER 5 70 EXPERIMENTAL RESULTS 70 5.1 EXPERIMENTAL SETUP 70 5.2 EXPERIMENTAL RESULTS 74 5.2.1 MEASUREMENT RESULTS OF PAM-4 RECEIVER WITH DECISION FEEDBACK EQUALIZER USING INVERTER-BASED SUMMER 74 5.2.2 MEASUREMENT RESULTS OF QUADRATURE SIGNAL CORRECTOR USING ADAPTIVE DELAY GAIN CONTROLLER 77 CHAPTER 6 83 CONCLUSION 83 BIBLIOGRAPHY 86박

SNU Open Repository and Archive

Design Techniques for Energy Efficient Multi-GB/S Serial I/O Transceivers

Author: Song Younghoon
Publication venue
Publication date: 09/01/2015
Field of study

Total I/O bandwidth demand is growing in high-performance systems due to the emergence of many-core microprocessors and in mobile devices to support the next generation of multi-media features. High-speed serial I/O energy efficiency must improve in order to enable continued scaling of these parallel computing platforms in applications ranging from data centers to smart mobile devices. The first work, a low-power forwarded-clock I/O transceiver architecture is presented that employs a high degree of output/input multiplexing, supply-voltage scaling with data rate, and low-voltage circuit techniques to enable low-power operation. The transmitter utilizes a 4:1 output multiplexing voltage-mode driver along with 4-phase clocking that is efficiently generated from a passive poly-phase filter. The output driver voltage swing is accurately controlled from 100-200 mV_(ppd) using a low-voltage pseudo-differential regulator that employs a partial negative-resistance load for improved low frequency gain. 1:8 input de-multiplexing is performed at the receiver equalizer output with 8 parallel input samplers clocked from an 8-phase injection-locked oscillator that provides more than 1UI de-skew range. Low-power high-speed serial I/O transmitters which include equalization to compensate for channel frequency dependent loss are required to meet the aggressive link energy efficiency targets of future systems. The second work presents a low power serial link transmitter design that utilizes an output stage which combines a voltage-mode driver, which offers low static-power dissipation, and current-mode equalization, which offers low complexity and dynamic-power dissipation. The utilization of current-mode equalization decouples the equalization settings and termination impedance, allowing for a significant reduction in pre-driver complexity relative to segmented voltage-mode drivers. Proper transmitter series termination is set with an impedance control loop which adjusts the on-resistance of the output transistors in the driver voltage-mode portion. Further reductions in dynamic power dissipation are achieved through scaling the serializer and local clock distribution supply with data rate. Finally, it presents that a scalable quarter-rate transmitter employs an analog-controlled impedance-modulated 2-tap voltage-mode equalizer and achieves fast power-state transitioning with a replica-biased regulator and ILO clock generation. Capacitively-driven 2 mm global clock distribution and automatic phase calibration allows for aggressive supply scaling

Texas A&M Repository

A LPDDR4 MEMORY CONTROLLER DESIGN WITH EYE CENTER DETECTION ALGORITHM

Author: 홍기문
Publication venue: 서울대학교 대학원
Publication date: 01/02/2016
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 김수환.The demand for higher bandwidth with reduced power consumption in mobile memory is increasing. In this thesis, architecture of the LPDDR4 memory controller, operated with a LPDDR4 memory, is proposed and designed, and efficient training algorithm, which is appropriate for this architecture, is proposed for memory training and verification. The operation speed range of the LPDDR4 memory specification is from 533Mbps to 4266Mbps, and the LPDDR4 memory controller is designed to support that range of the LPDDR4 memory. The phase-locked loop in the LPDDR4 memory controller is designed to operate between 1333MHz and 2133MHz. To cover the range of the LPDDR4 memory, the selectable frequency divider is used to provide operation clock. The output frequency of the phase-locked loop with divider is from 266MHz to 2133MHz. The delay-locked loop in the LPDDR4 memory controller is designed to operate between 266MHz and 2133MHz with 180˚ phase locking. The delay-locked loop is used each training operation, which is command training, data read and write training. To complete training in each training stage, eye center detection algorithm is used. The circuits for the proposed eye center detection algorithm such as delay line, phase interpolator and reference generator are designed and validated. The proposed 1x2y3x eye center detection algorithm is 23 times faster than conventional two-dimensional eye center detection algorithm and it can be implemented simply. Using 65nm CMOS process, the proposed LPDDR4 memory controller occupies 12mm2. The verification of the LPDDR4 memory controller is performed with commodity LPDDR4 memory. The verification of all training sequence, which is power on, initializing, boot up, command training, write leveling, read training, write training, is performed in this environment. The low voltage swing terminated logic driver and other several functions, including write leveling and data transmission, are verified at 4266Mbps and the entire LPDDR4 memory controller operations from 566Mbps to 1600Mbps are verified. The proposed eye center detection algorithm is verified from 566Mbps to 2843Mbps.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 INTRODUCTION 5 1.3 THESIS ORGANIZATION 7 CHAPTER 2 LPDDR4 MEMORY CONTROLLER DESIGN 8 2.1 DIFFERENCE BETWEEN LPDDR3 AND LPDDR4 MEMORY 8 2.1.1 ARCHITECTURAL DIFFERENCE BETWEEN LPDDR3 AND LPDDR4 MEMORY 10 2.1.2 SOURCE SYNCHRONOUS MATCHED SCHEME AND UNMATCHED SCHEME 11 2.1.3 LOW VOLTAGE SWING TERMINATED LOGIC DRIVER AND TERMINATION SCHEME 12 2.2 LPDDR4 MEMORY CONTROLLER SPECIFICATION 15 2.3 DESIGN PROCEDURE 18 CHAPTER 3 LPDDR4 MEMORY CONTROLLER ARCHITECTURE BASED ON MEMORY TRAINING 20 3.1 LPDDR4 MEMORY TRAINING SEQUENCE 20 3.2 LPDDR4 MEMORY TRAINING EYE DETECTION ALGORITHM 24 3.2.1 EYE CENTER DETECTION 24 3.2.2 1X2Y3X EYE CENTER DETECTION ALGORITHM 27 3.3. LPDDR4 MEMORY CONTROLLER DESIGN BASED ON MEMORY TRAINING 31 3.3.1 ARCHITECTURE FOR MEMORY BOOT UP AND POWER UP 31 3.3.2 CLOCK PATH ARCHITECTURE AND CLOCK TREE 34 3.3.3 COMMAND TRAINING AND COMMAND PATH ARCHITECTURE 35 3.3.4 WRITE LEVELING AND DATA STROBE TRANSMISSION PATH ARCHITECTURE 39 3.3.5 READ TRAINING AND READ PATH ARCHITECTURE 41 3.3.6 WRITE TRAINING AND WRITE PATH ARCHITECTURE 43 3.3.7 NORMAL READ/WRITE OPERATION AND MARGIN TEST 46 CHAPTER 4 LPDDR4 MEMORY CONTROLLER ARCHITECTURE MODELING AND CIRCUIT DESIGN 48 4.1 OVERALL LPDDR4 MEMORY CONTROLLER ARCHITECTURE MODELING 48 4.2 SIMULATION RESULT OF LPDDR4 MEMORY CONTROLLER MODELING 51 4.3 LPDDR4 MEMORY CONTROLLER CIRCUIT DESIGN 61 4.3.1 PHASE-LOCKED LOOP 61 4.3.2 DELAY-LOCKED LOOP 65 4.3.3 TRANSMITTER OF LPDDR4 MEMORY CONTROLLER: WRITE PATH 70 4.3.4 DE-SERIALIZER WITH CLOCK DOMAIN CROSSING 75 CHAPTER 5 MEASUREMENT RESULT OF LPDDR4 MEMORY CONTROLLER 77 5.1 LPDDR4 MEMORY CONTROLLER MEASUREMENT SETUP 77 5.1.1 LPDDR4 MEMORY CONTROLLER FLOOR PLAN AND LAYOUT 77 5.1.2 PACKAGE AND TEST BOARD 79 5.2 LPDDR4 MEMORY CONTROLLER SUB-BLOCK MEASUREMENT 81 5.2.1 PHASE-LOCKED LOOP 81 5.2.2 DELAY-LOCKED LOOP 83 5.2.3 200PS AND 800PS DELAY LINE 85 5.2.4 VOLTAGE REFERENCE GENERATOR 86 5.2.5 PHASE INTERPOLATOR 87 5.3 LPDDR4 MEMORY SYSTEM OPERATION MEASUREMENT 90 CHAPTER 6 CONCLUSION 93 APPENDIX OPERATION FLOW CHART OF THE PROPOSED LPDDR4 MEMORY CONTROLLER 95 BIBLIOGRAPHY 118 KOREAN ABSTRACT 124Docto

SNU Open Repository and Archive

PHY Link Design and Optimization For High-Speed Low-Power Communication Systems

Author: Fang Yuan
Publication venue
Publication date: 01/01/2015
Field of study

The ever-growing demands for high-bandwidth data transfer have been pushing towards advancing research efforts in the field of high-performing communication systems. Studies on the performance of single chip, e.g. faster multi-core processors and higher system memory capacity, have been explored. To further enhance the system performance, researches have been focused on the improvement of data-transfer bandwidth for chip-to-chip communication in the high-speed serial link. Many solutions have been addressed to overcome the bottleneck caused by the non-idealties such as bandwidth-limited electrical channel that connects two link devices and varieties of undesired noise in the communication systems. Nevertheless, with these solutions data have run into limitations of the timing margins for high-speed interfaces running at multiple gigabits per second data rates on low-cost Printed Circuit Board (PCB) material with constrained power budget. Therefore, the challenge in designing a physical layer (PHY) link for high-speed communication systems turns out to be power-efficient, reliable and cost-effective. In this context, this dissertation is intended to focus on architectural design, system-level and circuit-level verification of a PHY link as well as system performance optimization in respective of power, reliability and adaptability in high-speed communication systems. The PHY is mainly composed of clock data recovery (CDR), equalizers (EQs) and high- speed I/O drivers. Symmetrical structure of the PHY link is usually duplicated in both link devices for bidirectional data transmission. By introducing training mechanisms into high-speed communication systems, the timing in one link device is adaptively aligned to the timing condition specified in the other link device despite of different skews or induced jitter resulting from process, voltage and temperature (PVT) variations in the individual link. With reliable timing relationships among the interface signals provided, the total system bandwidth is dramatically improved. On the other hand, interface training offers high flexibility for reuse without further investigation on high demanding components involved in high costs. In the training mode, a CDR module is essential for reconstructing the transmitted bitstream to achieve the best data eye and to detect the edges of data stream in asynchronous systems or source-synchronous systems. Generally, the CDR works as a feedback control system that aligns its output clock to the center of the received data. In systems that contain multiple data links, the overall CDR power consumption increases linearly with the increase in number of links as one CDR is required for each link. Therefore, a power-efficient CDR plays a significant role in such systems with parallel links. Furthermore, a high performance CDR requires low jitter generation in spite of high input jitter. To minimize the trade-off between power consumption and CDR jitter, a novel CDR architecture is proposed by utilizing the proportional-integral (PI) controller and three times sampling scheme. Meanwhile, signal integrity (SI) becomes critical as the data rate exceeds several gigabits per second. Distorted data due to the non-idealties in systems are likely to reduce the signal quality aggressively and result in intolerable transmission errors in worst case scenarios, thus affect the system effective bandwidth. Hence, additional trainings such as transmitter (Tx) and receiver (Rx) EQ trainings for SI purpose are inserted into the interface training. Besides, a simplified system architecture with unsymmetrical placement of adaptive Rx and Tx EQs in a single link device is proposed and analyzed by using different coefficient adaptation algorithms. This architecture enables to reduce a large number of EQs through the training, especially in case of parallel links. Meanwhile, considerable power and chip area are saved. Finally, high-speed I/O driver against PVT variations is discussed. Critical issues such as overshoot and undershoot interfering with the data are primarily accompanied by impedance mismatch between the I/O driver and its transmitting channel. By applying PVT compensation technique I/O driver impedances can be effectively calibrated close to the target value. Different digital impedance calibration algorithms against PVT variations are implemented and compared for achieving fast calibration and low power requirements

TUbiblio

tuprints

Toward realizing power scalable and energy proportional high-speed wireline links

Author: Anand Tejasvi
Publication venue
Publication date: 01/12/2015
Field of study

Growing computational demand and proliferation of cloud computing has placed high-speed serial links at the center stage. Due to saturating energy efficiency improvements over the last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as output drivers, receiver, or clock generation and distribution. However, this approach yields very limited efficiency improvement. This dissertation takes an alternative approach toward reducing the serial link power. Instead of optimizing the power of individual building blocks, power of the entire serial link is reduced by exploiting serial link usage by the applications. It has been demonstrated that serial links in servers are underutilized. On average, they are used only 15% of the time, i.e. these links are idle for approximately 85% of the time. Conventional links consume power during idle periods to maintain synchronization between the transmitter and the receiver. However, by powering-off the link when idle and powering it back when needed, power consumption of the serial link can be scaled proportionally to its utilization. This approach of rapid power state transitioning is known as the rapid-on/off approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power, and power state transition energy must all be close to zero. However, in practice, it is very difficult to achieve these ideal conditions. Work presented in this dissertation addresses these challenges. When this research work was started (2011-12), there were only a couple of research papers available in the area of rapid-on/off links. Systematic study or design of a rapid power state transitioning in serial links was not available in the literature. Since rapid-on/off with nanoseconds granularity is not a standard in any wireline communication, even the popular test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However, these challenges provided a unique opportunity to explore new architectural techniques and identify trade-offs. The key contributions of this dissertation are as follows. The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to find alternative ways to reduce the serial link power. The second contribution is to identify potential power saving techniques and evaluate the challenges they pose and the opportunities they present. The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature. The transmitter achieves rapid-on/off capability in voltage mode output driver by using a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and periodic reference insertion. To ease timing requirements, an improved edge replacement logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time as a function of various circuit parameters is also discussed. The proposed transmitter demonstrates energy proportional operation over wide variations of link utilization, and is, therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns, respectively. This dissertation highlights key trade-off in the clock multiplier architecture, to achieve fast power-on-lock capability at the cost of jitter performance. The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi- plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita- tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves power-on-lock in 1ns. The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit- ter and receiver. It was the first reported design of a complete transceiver, with an embedded clock architecture, having rapid-on/off capability. Background phase calibration technique in PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on. The proposed transceiver demonstrates power scalability with a wide range of link utiliza- tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes by 100x (7Gb/s-to-70Mb/s). The sixth and final contribution is the design of a temperature sensor to compensate the frequency drifts due to temperature variations, during long power-off periods, in the fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor is designed with all digital logic gates and achieves low supply sensitivity. This sensor is suitable for integration in processor and DRAM environments. The proposed sensor works on the principle of directly converting temperature information to frequency and finally to digital bits. A novel sensing technique is proposed in which temperature information is acquired by creating a threshold voltage difference between the transistors used in the oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and the overhead of voltage regulators and an external ideal reference frequency is avoided. The effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of ±0.9oC and ±2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 , and measurement (conversion) time of 6.5μs

Illinois Digital Environment for Access to Learning and Scholarship Repository

Design of energy efficient high speed I/O interfaces

Author: Talegaonkar Mrunmay Vyankatesh
Publication venue
Publication date: 01/05/2016
Field of study

Energy efficiency has become a key performance metric for wireline high speed I/O interfaces. Consequently, design of low power I/O interfaces has garnered large interest that has mostly been focused on active power reduction techniques at peak data rate. In practice, most systems exhibit a wide range of data transfer patterns. As a result, low energy per bit operation at peak data rate does not necessarily translate to overall low energy operation. Therefore, I/O interfaces that can scale their power consumption with data rate requirement are desirable. Rapid on-off I/O interfaces have a potential to scale power with data rate requirements without severely affecting either latency or the throughput of the I/O interface. In this work, we explore circuit techniques for designing rapid on-off high speed wireline I/O interfaces and digital fractional-N PLLs. A burst-mode transmitter suitable for rapid on-off I/O interfaces is presented that achieves 6 ns turn-on time by utilizing a fast frequency settling ring oscillator in digital multiplying delay-locked loop and a rapid on-off biasing scheme for current mode output driver. Fabricated in 90 nm CMOS process, the prototype achieves 2.29 mW/Gb/s energy efficiency at peak data rate of 8 Gb/s. A 125X (8 Gb/s to 64 Mb/s) change in effective data rate results in 67X (18.29 mW to 0.27 mW) change in transmitter power consumption corresponding to only 2X (2.29 mW/Gb/s to 4.24 mW/Gb/s) degradation in energy efficiency for 32-byte long data bursts. We also present an analytical bit error rate (BER) computation technique for this transmitter under rapid on-off operation, which uses MDLL settling measurement data in conjunction with always-on transmitter measurements. This technique indicates that the BER bathtub width for 10^(−12) BER is 0.65 UI and 0.72 UI during rapid on-off operation and always-on operation, respectively. Next, a pulse response estimation-based technique is proposed enabling burst-mode operation for baud-rate sampling receivers that operate over high loss channels. Such receivers typically employ discrete time equalization to combat inter-symbol interference. Implementation details are provided for a receiver chip, fabricated in 65nm CMOS technology, that demonstrates efficacy of the proposed technique. A low complexity pulse response estimation technique is also presented for low power receivers that do not employ discrete time equalizers. We also present techniques for implementation of highly digital fractional-N PLL employing a phase interpolator based fractional divider to improve the quantization noise shaping properties of a 1-bit ∆Σ frequency-to-digital converter. Fabricated in 65nm CMOS process, the prototype calibration-free fractional-N Type-II PLL employs the proposed frequency-to-digital converter in place of a high resolution time-to-digital converter and achieves 848 fs rms integrated jitter (1 kHz-30 MHz) and -101 dBc/Hz in-band phase noise while generating 5.054 GHz output from 31.25 MHz input

Illinois Digital Environment for Access to Learning and Scholarship Repository

Source-synchronous I/O Links using Adaptive Interface Training for High Bandwidth Applications

Author: Jaiswal Ashok Kumar
Publication venue
Publication date: 16/07/2014
Field of study

Mobility is the key to the global business which requires people to be always connected to a central server. With the exponential increase in smart phones, tablets, laptops, mobile traffic will soon reach in the range of Exabytes per month by 2018. Applications like video streaming, on-demand-video, online gaming, social media applications will further increase the traffic load. Future application scenarios, such as Smart Cities, Industry 4.0, Machine-to-Machine (M2M) communications bring the concepts of Internet of Things (IoT) which requires high-speed low power communication infrastructures. Scientific applications, such as space exploration, oil exploration also require computing speed in the range of Exaflops/s by 2018 which means TB/s bandwidth at each memory node. To achieve such bandwidth, Input/Output (I/O) link speed between two devices needs to be increased to GB/s. The data at high speed between devices can be transferred serially using complex Clock-Data-Recovery (CDR) I/O links or parallely using simple source-synchronous I/O links. Even though CDR is more efficient than the source-synchronous method for single I/O link, but to achieve TB/s bandwidth from a single device, additional I/O links will be required and the source-synchronous method will be more advantageous in terms of area and power requirements as additional I/O links do not require extra hardware resources. At high speed, there are several non-idealities (Supply noise, crosstalk, Inter- Symbol-Interference (ISI), etc.) which create unwanted skew problem among parallel source-synchronous I/O links. To solve these problems, adaptive trainings are used in time domain to synchronize parallel source-synchronous I/O links irrespective of these non-idealities. In this thesis, two novel adaptive training architectures for source-synchronous I/O links are discussed which require significantly less silicon area and power in comparison to state-of-the-art architectures. First novel adaptive architecture is based on the unit delay concept to synchronize two parallel clocks by adjusting the phase of one clock in only one direction. Second novel adaptive architecture concept consists of Phase Interpolator (PI)-based Phase Locked Loop (PLL) which can adjust the phase in both direction and achieve faster synchronization at the expense of added complexity. With an increase in parallel I/O links, clock skew which is generated by the improper clock tree, also affects the timing margin. Incorrect duty cycle further reduces the timing margin mainly in Double Data Rate (DDR) systems which are generally used to increase the bandwidth of a high-speed communication system. To solve clock skew and duty cycle problems, a novel clock tree buffering algorithm and a novel duty cycle corrector are described which further reduce the power consumption of a source-synchronous system

TUbiblio

tuprints

Design Techniques for High Performance Serial Link Transceivers

Author: Min Byungho
Publication venue
Publication date: 21/08/2017
Field of study

Increasing data rates over electrical channels with significant frequency-dependent loss is difficult due to excessive inter-symbol interference (ISI). In order to achieve sufficient link margins at high rates, I/O system designers implement equalization in the transmitters and are motivated to consider more spectrally-efficient modulation formats relative to the common PAM-2 scheme, such as PAM-4 and duobinary. The first work, reviews when to consider PAM-4 and duobinary formats, as the modulation scheme which yields the highest system margins at a given data rate is a function of the channel loss profile, and presents a 20Gb/s triple-mode transmitter capable of efficiently implementing these three modulation schemes and three-tap feedforward equalization. A statistical link modeling tool, which models ISI, crosstalk, random noise, and timing jitter, is developed to compare the three common modulation formats operating on electrical backplane channel models. In order to improve duobinary modulation efficiency, a low-power quarter-rate duobinary precoder circuit is proposed which provides significant timing margin improvement relative to full-rate precoders. Also as serial I/O data rates scale above 10 Gb/s, crosstalk between neighboring channels degrades system bit-error rate (BER) performance. The next work presents receive-side circuitry which merges the cancellation of both near-end and far-end crosstalk (NEXT/FEXT) and can automatically adapt to different channel environments and variations in process, voltage, and temperature. NEXT cancellation is realized with a novel 3-tap FIR filter which combines two traditional FIR filter taps and a continuous-time band-pass filter IIR tap for efficient crosstalk cancellation, with all filter tap coefficients automatically determined via an ondie sign-sign least-mean-square (SS-LMS) adaptation engine. FEXT cancellation is realized by coupling the aggressor signal through a differentiator circuit whose gain is automatically adjusted with a power-detection-based adaptation loop. In conclusion, the proposed architectures in the transmitter side and receiver side together are to be good solution in the high speed I/O serial links to improve the performance by overcome the physical channel loss and adjacent channel noise as the system becomes complicated

Texas A&M Repository

고속 DRAM 인터페이스를 위한 전압 및 온도에 둔감한 클록 패스와 위상 오류 교정기 설계

Author: 신소영
Publication venue: 서울대학교 대학원
Publication date: 01/02/2021
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2021. 2. 정덕균.To cope with problems caused by the high-speed operation of the dynamic random access memory (DRAM) interface, several approaches are proposed that are focused on the clock path of the DRAM. Two delay-locked loop (DLL) based schemes, a forwarded-clock (FC) receiver (RX) with self-tracking loop and a quadrature error corrector, are proposed. Moreover, an open-loop based scheme is presented for drift compensation in the clock distribution. The open-loop scheme consumes less power consumption and reduces design complexity. The FC RX uses DLLs to compensate for voltage and temperature (VT) drift in unmatched memory interfaces. The self-tracking loop consists of two-stage cascaded DLLs to operate in a DRAM environment. With the write training and the proposed DLL, the timing relationship between the data and the sampling clock is always optimal. The proposed scheme compensates for delay drift without relying on data transitions or re-training. The proposed FC RX is fabricated in 65-nm CMOS process and has an active area containing 4 data lanes of 0.0329 mm2. After the write training is completed at the supply voltage of 1 V, the measured timing margin remains larger than 0.31-unit interval (UI) when the supply voltage drifts in the range of 0.94 V and 1.06 V from the training voltage, 1 V. At the data rate of 6.4 Gb/s, the proposed FC RX achieves an energy efficiency of 0.45 pJ/bit. Contrary to the aforementioned scheme, an open-loop-based voltage drift compensation method is proposed to minimize power consumption and occupied area. The overall clock distribution is composed of a current mode logic (CML) path and a CMOS path. In the proposed scheme, the architecture of the CML-to-CMOS converter (C2C) and the inverter is changed to compensate for supply voltage drift. The bias generator provides bias voltages to the C2C and inverters according to supply voltage for delay adjustment. The proposed clock tree is fabricated in 40 nm CMOS process and the active area is 0.004 mm2. When the supply voltage is modulated by a sinusoidal wave with 1 MHz, 100 mV peak-to-peak swing from the center of 1.1 V, applying the proposed scheme reduces the measured root-mean-square (RMS) jitter from 3.77 psRMS to 1.61 psRMS. At 6 GHz output clock, the power consumption of the proposed scheme is 11.02 mW. A DLL-based quadrature error corrector (QEC) with a wide correction range is proposed for the DRAM whose clocks are distributed over several millimeters. The quadrature error is corrected by adjusting delay lines using information from the phase error detector. The proposed error correction method minimizes increased jitter due to phase error correction by setting at least one of the delay lines in the quadrature clock path to the minimum delay. In addition, the asynchronous calibration on-off scheme reduces power consumption after calibration is complete. The proposed QEC is fabricated in 40 nm CMOS process and has an active area of 0.048 mm2. The proposed QEC exhibits a wide correctable error range of 101.6 ps and the remaining phase errors are less than 2.18° from 0.8 GHz to 2.3 GHz clock. At 2.3 GHz, the QEC contributes 0.53 psRMS jitter. Also, at 2.3 GHz, the power consumption is reduced from 8.89 mW to 3.39 mW when the calibration is off.본 논문에서는 동적 랜덤 액세스 메모리 (DRAM)의 속도가 증가함에 따라 클록 패스에서 발생할 수 있는 문제에 대처하기 위한 세 가지 회로들을 제안하였다. 제안한 회로들 중 두 방식들은 지연동기루프 (delay-locked loop) 방식을 사용하였고 나머지 한 방식은 면적과 전력 소모를 줄이기 위해 오픈 루프 방식을 사용하였다. DRAM의 비정합 수신기 구조에서 데이터 패스와 클록 패스 간의 지연 불일치로 인해 전압 및 온도 변화에 따라 셋업 타임 및 홀드 타임이 줄어드는 문제를 해결하기 위해 지연동기루프를 사용하였다. 제안한 지연동기루프 회로는 DRAM 환경에서 동작하도록 두 개의 지연동기루프로 나누었다. 또한 초기 쓰기 훈련을 통해 데이터와 클록을 타이밍 마진 관점에서 최적의 위치에 둘 수 있다. 따라서 제안하는 방식은 데이터 천이 정보가 필요하지 않다. 65-nm CMOS 공정을 이용하여 만들어진 칩은 6.4 Gb/s에서 0.45 pJ/bit의 에너지 효율을 가진다. 또한 1 V에서 쓰기 훈련 및 지연동기루프를 고정시키고 0.94 V에서 1.06 V까지 공급 전압이 바뀌었을 때 타이밍 마진은 0.31 UI보다 큰 값을 유지하였다. 다음으로 제안하는 회로는 클록 분포 트리에서 전압 변화로 인해 클록 패스의 지연이 달라지는 것을 앞서 제시한 방식과 달리 오픈 루프 방식으로 보상하였다. 기존 클록 패스의 인버터와 CML-to-CMOS 변환기의 구조를 변경하여 바이어스 생성 회로에서 생성한 공급 전압에 따라 바뀌는 바이어스 전압을 가지고 지연을 조절할 수 있게 하였다. 40-nm CMOS 공정을 이용하여 만들어진 칩의 6 GHz 클록에서의 전력 소모는 11.02 mW로 측정되었다. 1.1 V 중심으로 1 MHz, 100 mV 피크 투 피크를 가지는 사인파 성분으로 공급 전압을 변조하였을 때 제안한 방식에서의 지터는 기존 방식의 3.77 psRMS에서 1.61 psRMS로 줄어들었다. DRAM의 송신기 구조에서 다중 위상 클록 간의 위상 오차는 송신된 데이터의 데이터 유효 창을 감소시킨다. 이를 해결하기 위해 지연동기루프를 도입하게 되면 증가된 지연으로 인해 위상이 교정된 클록에서 지터가 증가한다. 본 논문에서는 증가된 지터를 최소화하기 위해 위상 교정으로 인해 증가된 지연을 최소화하는 위상 교정 회로를 제시하였다. 또한 유휴 상태에서 전력 소모를 줄이기 위해 위상 오차를 교정하는 회로를 입력 클록과 비동기식으로 끌 수 있는 방법 또한 제안하였다. 40-nm CMOS 공정을 이용하여 만들어진 칩의 위상 교정 범위는 101.6 ps이고 0.8 GHz 부터 2.3 GHz까지의 동작 주파수 범위에서 위상 교정기의 출력 클록의 위상 오차는 2.18°보다 작다. 제안하는 위상 교정 회로로 인해 추가된 지터는 2.3 GHz에서 0.53 psRMS이고 교정 회로를 껐을 때 전력 소모는 교정 회로가 켜졌을 때인 8.89 mW에서 3.39 mW로 줄어들었다.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Thesis Organization 4 Chapter 2 Background on DRAM Interface 5 2.1 Overview 5 2.2 Memory Interface 7 Chapter 3 Background on DLL 11 3.1 Overview 11 3.2 Building Blocks 15 3.2.1 Delay Line 15 3.2.2 Phase Detector 17 3.2.3 Charge Pump 19 3.2.4 Loop filter 20 Chapter 4 Forwarded-Clock Receiver with DLL-based Self-tracking Loop for Unmatched Memory Interfaces 21 4.1 Overview 21 4.2 Proposed Separated DLL 25 4.2.1 Operation of the Proposed Separated DLL 27 4.2.2 Operation of the Digital Loop Filter in DLL 31 4.3 Circuit Implementation 33 4.4 Measurement Results 37 4.4.1 Measurement Setup and Sequence 38 4.4.2 VT Drift Measurement and Simulation 40 Chapter 5 Open-loop-based Voltage Drift Compensation in Clock Distribution 46 5.1 Overview 46 5.2 Prior Works 50 5.3 Voltage Drift Compensation Method 52 5.4 Circuit Implementation 57 5.5 Measurement Results 61 Chapter 6 Quadrature Error Corrector with Minimum Total Delay Tracking 68 6.1 Overview 68 6.2 Prior Works 70 6.3 Quadrature Error Correction Method 73 6.4 Circuit Implementation 82 6.5 Measurement Results 88 Chapter 7 Conclusion 96 Bibliography 98 초록 102Docto

SNU Open Repository and Archive