Search CORE

136 research outputs found

A 0.8 – 2.4 Gbps Driver With Adjustable De-Emphasis Scheme For Ddr3 Memory Interface

Author: Lim Zong Zheng
Publication venue
Publication date: 01/01/2014
Field of study

The need for greater memory bandwidth to boost the computer system performance has driven system memory evolution to Double Data Rate Synchronous Dynamic Read Access Memory (DDR SDRAM) technologies. Trends to maximize memory bandwidth have caused Inter-Symbol Interference (ISI) become significant which degraded the signal integrity of transmitted data. In this research, a driver architecture with adjustable de-emphasis and impedance control scheme is proposed for high-data rate and high-density DDR3 SDRAM memory system. The proposed driver is implemented using 45 nm CMOS process technology. The designs and implementations of the proposed driver involve the design of driver architecture, data controller, impedance calibration block with reference generator as well as the layout for critical analog circuits i.e. three driver segments for post-layout simulations to ensure the parasitic in layout does not has significant effect on driver performances. The driver has 15 de-emphasis legs that can form 15 de-emphasis voltage levels that capable of reducing ISI-induced jitter at high operating frequency. Moreover, high density DDR3 memory system can deteriorate the far-end eye jitter and eye height that causes difficulties in data sampling and recovery. Thus, the driving impedance of the proposed driver can be programmed between 20, 30 and 40 Ω to compensate the variability of board routing effect in memory system and hence, improving signal integrity

Repository@USM

An Energy-Efficient Reconfigurable Mobile Memory Interface for Computing Systems

Author: Far Majid Jalali
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2016
Field of study

The critical need for higher power efficiency and bandwidth transceiver design has significantly increased as mobile devices, such as smart phones, laptops, tablets, and ultra-portable personal digital assistants continue to be constructed using heterogeneous intellectual properties such as central processing units (CPUs), graphics processing units (GPUs), digital signal processors, dynamic random-access memories (DRAMs), sensors, and graphics/image processing units and to have enhanced graphic computing and video processing capabilities. However, the current mobile interface technologies which support CPU to memory communication (e.g. baseband-only signaling) have critical limitations, particularly super-linear energy consumption, limited bandwidth, and non-reconfigurable data access. As a consequence, there is a critical need to improve both energy efficiency and bandwidth for future mobile devices.;The primary goal of this study is to design an energy-efficient reconfigurable mobile memory interface for mobile computing systems in order to dramatically enhance the circuit and system bandwidth and power efficiency. The proposed energy efficient mobile memory interface which utilizes an advanced base-band (BB) signaling and a RF-band signaling is capable of simultaneous bi-directional communication and reconfigurable data access. It also increases power efficiency and bandwidth between mobile CPUs and memory subsystems on a single-ended shared transmission line. Moreover, due to multiple data communication on a single-ended shared transmission line, the number of transmission lines between mobile CPU and memories is considerably reduced, resulting in significant technological innovations, (e.g. more compact devices and low cost packaging to mobile communication interface) and establishing the principles and feasibility of technologies for future mobile system applications. The operation and performance of the proposed transceiver are analyzed and its circuit implementation is discussed in details. A chip prototype of the transceiver was implemented in a 65nm CMOS process technology. In the measurement, the transceiver exhibits higher aggregate data throughput and better energy efficiency compared to prior works

The Research Repository @ WVU (West Virginia University)

메모리 인터페이스를 위한 멀티 레벨 단일 종단 송신기 설계

Author: 정용운
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2020. 8. 김수환.본 연구에서 메모리 인터페이스를 위한 멀티 레벨 송신기가 제시되었다. 프로세서와 메모리 간의 성능 차이가 매년 계속 증가함에 따라, 메모리는 전체 시스템의 병목점이 되고있다. 우리는 메모리 대역폭을 늘리기 위해 PAM-4 단일 종단 송신기를 제안하였고, 멀티 랭크 메모리를 위한 duobinary 단일 종단 송신기를 제안하였다. 제안된 PAM-4 송신기의 드라이버는 높은 선형성과 임피던스 정합을 동시에 만족한다. 또한 저항이나 인덕터를 사용하지 않아 작은 면적을 차지한다. 제안된 ZQ 캘리브레이션은 세개의 교정 점을 가지고 있어 송신기가 정확한 임피던스와 선형적인 출력을 갖게 한다. 프로토 타입은 65nm CMOS 공정으로 제작되었고 송신기는 0.0333mm2의 면적을 차지한다. 측정된 28Gb/s에서의 eye는 18.3ps의 길이와 42.4mV의 높이를 갖고, 에너지 효율은 0.64pJ/bit이다. ZQ 캘리브레이션과 함께 측정된 RLM은 0.993이다. 메모리의 용량을 늘리기 위해 하나의 패키지에 여러 개의 DRAM 다이를 수직으로 쌓는 패키징은 메모리의 중앙 패드 구조와 결합되어 짧은 반사를 야기하는 스텁을 만든다. 우리는 이 문제를 완화하기위해 반사 기반 duobinary 송신기를 제안했다. 이 송신기는 반사를 이용하여 duobinary signaling을 한다. 2탭 반대 강조 기술과 슬루 레이트 조절 기술이 신호 완결성을 높이기 위해 사용되었다. NRZ eye가 없는 10Gb/s에서 측정된 duobinary eye는 63.6ps 길이와 70.8mV의 높이를 갖는다. 측정된 에너지 효율은 1.38pJ/bit이다.Multi-level transmitters for memory interfaces have been presented. The performance gap between processor and memory has been increased by 50% every year, making memory to be a bottle neck of the overall system. To increase memory bandwidth, we have proposed a PAM-4 single-ended transmitter. To compensate for the side effect of the multi-rank memory, we have proposed a reflection-based duobinary transmitter. The proposed PAM-4 transmitter has the driver, which simultaneously satisfies impedance matching and high linearity. The driver occupies a small area due to a resistorless and inductorless structure. The proposed ZQ calibration for PAM-4 has three calibration points, which allow the transmitter to have accurate impedance and linear output. The ZQ calibration considers impedance variation of both the driver and the receiver. A prototype has been fabricated in 65nm CMOS process, and the transmitter occupies 0.0333mm2. The measured eye has a width of 18.3ps and a height of 42.4mV at 28Gb/s, and the measured energy efficiency is 0.64pJ/b. The measured RLM with the 3-point ZQ calibration is 0.993. To increase memory density, the stacked die packaging with multiple DRAM die stacked vertically in one package is widely used. However, combined with the center-pad structure, the structure creates stubs that cause short reflections. We have proposed the reflection-based duobinary transmitter to mitigate this problem. The proposed transmitter uses reflection for duobinary signaling. The 2-tap opposite FFE and the slew-rate control are used to increase signal integrity. The measured duobinary eye at 10Gb/s has a width of 63.6ps and a height of 70.8mV while there is no NRZ eye opening. The measured energy efficiency is 1.38pJ/bit.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 8 CHAPTER 2 MUTI-LEVEL SIGNALING 9 2.1 PAM-4 SIGNALING 9 2.2 DESIGN CONSIDERATIONS FOR PAM-4 TRANSMITTER 16 2.2.1 LEVEL SEPARATION MISMATCH RATIO (RLM) 17 2.2.2 IMPEDANCE MATCHING 19 2.2.3 PRIOR ARTS 21 2.3 DUOBINARY SIGNALING 24 CHAPTER 3 HIGH-LINEARITY AND IMPEDANCE-MATCHED PAM-4 TRANSMITTER 30 3.1 OVERALL ARCHITECTURE 31 3.2 SINGLE-ENDED IMPEDANCE-MATCHED PAM-4 DRIVER 33 3.3 3-POINT ZQ CALIBRATION FOR PAM-4 47 CHAPTER 4 REFLECTION-BASED DUOBINARY TRANSMITTER 57 4.1 BIDIRECTIONAL DUAL-RANK MEMORY SYSTEM 58 4.2 CONCEPT OF REFLECTION-BASED DUOBINARY SIGNALING 66 4.3 REFLECTION-BASED DUOBINARY TRANSMITTER 70 4.3.1 OVERALL ARCHITECTURE 70 4.3.2 EQUALIZATION FOR REFLECTION-BASED DUOBINARY SIGNALING 72 4.3.3 2D BINARY-SEGMENTED DRIVER 75 CHAPTER 5 EXPERIMENTAL RESULTS 77 5.1 HIGH-LINEARITY AND IMPEDANCE-MATCHED PAM-4 TRANSMITTER 77 5.2 REFLECTION-BASED DUOBINARY TRANSMITTER 84 CHAPTER 6 92 CONCLUSION 92 BIBLIOGRAPHY 94Docto

SNU Open Repository and Archive

펄스 기반 피드 포워드 이퀄라이저를 갖춘 고용량 DRAM을 위한 컨트롤러 PHY 설계

Author: 고형준
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 김수환.A controller PHY for managed DRAM solution, which is a new memory structure to maximize capacity while minimizing refresh power, is presented. Inter-symbol interference is critical in such a high-capacity DRAM interface in which many DRAM chips share a command/address (C/A) channel. A pulse-based feed-forward equalizer (PB-FFE) is introduced to reduce ISI on a C/A channel. The controller PHY supports all the training sequences specified in the DDR4 standard. A glitch-free DCDL is also adopted to perform link training efficiently and to reduce training time. The DQ transmitter adopts quarter-rate architecture to reduce output latency. For the quarter-rate transmitters in DQ, we propose a quadrature error corrector (QEC), in which clock signal phase errors are corrected using two replicas of the 4:1 serializer of the output stage. Pulse shrinking is used to compare and equalize the outputs of these two replica serializers. A controller PHY was fabricated in 55nm CMOS. The PB-FFE increases the timing margin from 0.23UI to 0.29UI at 1067Mbps. At 2133Mbps, the read timing and voltage margins are 0.53UI and 211mV after read training, and the write margins are 0.72UI and 230mV after write training. To validate the QEC effectiveness, a prototype quarter-rate transmitter, including the QEC, was fabricated to another chip in 65nm CMOS. Adopting our QEC, the experimental results show that the output phase errors of the transmitter are reduced to a residual error of 0.8ps, and the output eye width and height are improved by 84% and 61%, respectively, at a data-rate of 12.8Gbps.본 연구에서 용량을 최대화하면서도 리프레시 전력을 최소화할 수 있는 새로운 메모리 구조인 관리형 DRAM 솔루션을 위한 컨트롤러 PHY를 제시하였다. 이와 같은 고용량 DRAM 인터페이스에서는 많은 DRAM 칩이 명령 / 주소 (C/A) 채널을 공유하고 있어서 심볼 간 간섭이 발생한다. 본 연구에서는 이러한 C/A 채널에서의 심볼 간 간섭을 줄이기 위해 펄스 기반 피드 포워드 이퀄라이저 (PB-FFE)를 채택하였다. 또한 본 연구의 컨트롤러 PHY는 DDR4 표준에 지정된 모든 트레이닝 시퀀스를 지원한다. 링크 트레이닝을 효율적으로 수행하고 트레이닝 시간을 줄이기 위해 글리치가 발생하지 않는 디지털 제어 지연 라인 (DCDL)을 채택하였다. 컨트롤러 PHY의 DQ 송신기는 출력 대기 시간을 줄이기 위해 쿼터 레이트 구조를 채택하였다. 쿼터 레이트 송신기의 경우에는 직교 클럭 간 위상 오류가 출력 신호의 무결성에 영향을 주게 된다. 이러한 영향을 최소화하기 위해 본 연구에서는 출력 단의 4 : 1 직렬 변환기의 두 복제본을 사용하여 클록 신호 위상 오류를 수정하는 QEC (Quadrature Error Corrector)를 제안하였다. 복제된 2개의 직렬 변환기의 출력을 비교하고 균등화하기 위해 펄스 수축 지연 라인이 사용되었다. 컨트롤러 PHY는 55nm CMOS 공정으로 제조되었다. PB-FFE는 1067Mbps에서 C/A 채널 타이밍 마진을 0.23UI에서 0.29UI로 증가시킨다. 읽기 트레이닝 후 읽기 타이밍 및 전압 마진은 2133Mbps에서 0.53UI 및 211mV이고, 쓰기 트레이닝 후 쓰기 마진은 0.72UI 및 230mV이다. QEC의 효과를 검증하기 위해 QEC를 포함한 프로토 타입 쿼터 레이트 송신기를 65nm CMOS의 다른 칩으로 제작하였다. QEC를 적용한 실험 결과, 송신기의 출력 위상 오류가 0.8ps의 잔류 오류로 감소하고, 출력 데이터 눈의 폭과 높이가 12.8Gbps의 데이터 속도에서 각각 84 %와 61 % 개선되었음을 보여준다.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.1.1 HEAVY LOAD C/A CHANNEL 5 1.1.2 QUARTER-RATE ARCHITECTURE IN DQ TRANSMITTER 7 1.1.3 SUMMARY 8 1.2 THESIS ORGANIZATION 10 CHAPTER 2 ARCHITECTURE 11 2.1 MDS DIMM STRUCTURE 11 2.2 MDS CONTROLLER 15 2.3 MDS CONTROLLER PHY 17 2.3.1 INITIALIZATION SEQUENCE 20 2.3.2 LINK TRAINING FINITE-STATE MACHINE 23 2.3.3 POWER DOWN MODE 28 CHAPTER 3 PULSE-BASED FEED-FORWARD EQUALIZER 29 3.1 COMMAND/ADDRESS CHANNEL 29 3.2 COMMAND/ADDRESS TRANSMITTER 33 3.3 PULSE-BASED FEED-FORWARD EQUALIZER 35 CHAPTER 4 CIRCUIT IMPLEMENTATION 39 4.1 BUILDING BLOCKS 39 4.1.1 ALL-DIGITAL PHASE-LOCKED LOOP (ADPLL) 39 4.1.2 ALL-DIGITAL DELAY-LOCKED LOOP (ADDLL) 44 4.1.3 GLITCH-FREE DCDL CONTROL 47 4.1.4 DUTY-CYCLE CORRECTOR (DCC) 50 4.1.5 DQ/DQS TRANSMITTER 52 4.1.6 DQ/DQS RECEIVER 54 4.1.7 ZQ CALIBRATION 56 4.2 MODELING AND VERIFICATION OF LINK TRAINING 59 4.3 BUILT-IN SELF-TEST CIRCUITS 66 CHAPTER 5 QUADRATURE ERROR CORRECTOR USING REPLICA SERIALIZERS AND PULSE-SHRINKING DELAY LINES 69 5.1 PHASE CORRECTION USING REPLICA SERIALIZERS AND PULSE-SHRINKING UNITS 69 5.2 OVERALL QEC ARCHITECTURE AND ITS OPERATION 71 5.3 FINE DELAY UNIT IN THE PSDL 76 CHAPTER 6 EXPERIMENTAL RESULTS 78 6.1 CONTROLLER PHY 78 6.2 PROTOTYPE QEC 88 CHAPTER 7 CONCLUSION 94 BIBLIOGRAPHY 96Docto

SNU Open Repository and Archive

적응형 눈 감지 방법을 포함한 저전력 메모리 컨트롤러의 설계

Author: 김민오
Publication venue: 서울대학교 대학원
Publication date: 01/08/2017
Field of study

학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 김수환.and the read margin was enhanced from 0.30UI and 76mV without AF-CTLE to 0.47UI and 80mV to with AF-CTLE. The power efficiency during burst write and read were 5.68pJ/bit and 1.83pJ/bit respectively.A 4266Mb/s/pin LPDDR4 memory controller with an asynchronous feedback continuous-time linear equalizer and an adaptive 3-step eye detection algorithm is presented. The asynchronous feedback continuous-time linear equalizer removes the glitch of DQS without training by applying an offset larger than the noise, and improves read margin by operating as a decision feedback equalizer in DQ path. The adaptive 3-step eye detection algorithm reduces power consumption and black-out time in initialization sequence and retraining in comparison to the 2-dimensional full scanning. In addition, the adaptive 3-step eye detection algorithm can maintain the accuracy by sequentially searching the eye boundaries and initializing the resolution using the binary search method when the eye detection result changes. To achieve high bandwidth, a transmitter and receiver suitable for training are proposed. The transmitter consists of a phase interpolator, a digitally-controlled delay line, a 16:1 serializer, a pre-driver and low-voltage swing terminated logic. The receiver consists of a reference voltage generator, a continuous-time linear equalizer, a phase interpolator, a digitally-controlled delay line, a 1:4 deserializer, and a 4:16 deserializer. The clocking architecture is also designed for low power consumption in idle periods, which are commonly lengthy in mobile applications. A prototype chip was implemented in a 65nm CMOS process with ball grid array package and tested with commodity LPDDR4. The write margin was 0.36UI and 148mVCHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 5 CHAPTER 2 LPDDR4 6 2.1 COMPARISON BETWEEN LPDDR3 AND LPDDR4 6 2.2 SOURCE SYNCHRONOUS CLOCKING SCHEME 9 2.3 SIGNALING STANDARDS 11 2.4 MULTIPLE TRAININGS 14 2.5 RE-TRAINING AND RE-INITIALIZATION 16 CHAPTER 3 ADAPTIVE EYE DETECTION 18 3.1 EYE DETECTION 18 3.2 1X2Y3X EYE DETECTION 20 3.3 ADAPTIVE GAIN CONTROL 22 3.4 ADAPTIVE 1X2Y3X EYE DETECTION 24 CHAPTER 4 LPDDR4 MEMORY CONTROLLER 26 4.1 DESIGN PROCEDURE 26 4.2 ARCHITECTURE 30 4.2.1 TRANSMITTER 33 4.2.2 RECEIVER 35 4.2.3 CLOCKING ARCHITECTURE 38 4.3 CIRCUIT IMPLEMENTATION 43 4.3.1 ADPLL WITH MULTI-MODULUS DIVIDER 43 4.3.2 ADDLL WITH TRIANGULAR-MODULATED PI 45 4.3.3 CTLE WITH AUTO-DQS CLEANING 47 4.3.4 DES WITH CLOCK DOMAIN CROSSING 52 4.3.5 LVSTL WITH ZQ CALIBRATION 54 4.3.6 COARSE-FINE DCDL 56 4.4 LINK TRAINING 57 4.4.1 SIMULATION RESULTS 59 CHAPTER 5 MEASUREMENT RESULTS 72 5.1 MEASUREMENT SETUP 72 5.2 MEASUREMENT RESULTS OF SUB-BLOCK 80 5.2.1 ADPLL WITH MULTI-MODULUS DIVIDER 80 5.2.2 ADDLL WITH TRIANGULAR-MODULATED PI 82 5.2.3 COARSE-FINE DCDL 84 5.3 LPDDR4 INTERFACE MEASUREMENT RESULTS 84 CHAPTER 6 CONCLUSION 88 BIBLIOGRAPHY 90Docto

SNU Open Repository and Archive

Thin-film Block Copolymers (BCPs) Self-assembly as Versatile Patterning Scheme for Functional Nanomaterials

Author: Zhang Le
Publication venue: LSU Digital Commons
Publication date: 10/10/2018
Field of study

Nanopattern generation is required for building various structural entities in every production process that involves nanostructures. Advancing nanopatterning technologies play an important role in developing and broadening the current nanopatterning technologies to meet up with the ever-demanding requirements in the realm of smaller feature sizes, smoother line-edge roughness (LER) and facile pattern transfer in pursuit of faster computer processors, better electrocatalysts and more compact and intelligent sensors, etc. Conventionally, patterning needs are heavily relied on photolithography, a technique that dominate chip-making industry for more than 50 years. However conventional photolithography is bounded by inherent resolution limits and difficult to be applied on non-flat, flexible, or stretchable substrates. Advancement in patterning techniques are urgently needed to enhance the capability for sub-10 nm patterning onto versatile substrates. The patterning techniques adopted in this work is a bottom-up self-assembly driven scheme based on the phase segregation of block copolymers (BCPs). Cleverly designed BCPs system can generate self-assembled pattern to give a sub-10 nm pitch, demonstrating the tremendous potential of novel BCP chemistries in generating sub-10 nm features. Recent excellent works on BCPs with sub-10 nm natural periods are timely reviewed, and key principles in designing next generation BCP candidates for extreme scale lithography are proposed in the outlook. Thin film BCPs templates were leveraged to generate patterns on various substrates including silicon, gold, glassy carbon, reduced graphene oxide, Nafion ® membrane, and perfluorinated anion exchange membrane. The profound meaning of these demonstration is twofold, firstly showcased the robustness and wide portability of the tested BCP patterning scheme, secondly demonstration of introducing the BCP templates onto smart substrates that have special functionality and wide implications. Further ionization and metallization of BCPs templates exemplify the feasibility of fabricating nanostructured electrolytes and metal nanowires with controlled periodic features sizes. Ordered nanostructures with designed ionic loadings, metal densities on functional substrates open up tremendous possibility to be incorporated into sensor, nanoseparators and nanoreactors with novel properties that yet to be uncovered

Louisiana State University

PHY Link Design and Optimization For High-Speed Low-Power Communication Systems

Author: Fang Yuan
Publication venue
Publication date: 01/01/2015
Field of study

The ever-growing demands for high-bandwidth data transfer have been pushing towards advancing research efforts in the field of high-performing communication systems. Studies on the performance of single chip, e.g. faster multi-core processors and higher system memory capacity, have been explored. To further enhance the system performance, researches have been focused on the improvement of data-transfer bandwidth for chip-to-chip communication in the high-speed serial link. Many solutions have been addressed to overcome the bottleneck caused by the non-idealties such as bandwidth-limited electrical channel that connects two link devices and varieties of undesired noise in the communication systems. Nevertheless, with these solutions data have run into limitations of the timing margins for high-speed interfaces running at multiple gigabits per second data rates on low-cost Printed Circuit Board (PCB) material with constrained power budget. Therefore, the challenge in designing a physical layer (PHY) link for high-speed communication systems turns out to be power-efficient, reliable and cost-effective. In this context, this dissertation is intended to focus on architectural design, system-level and circuit-level verification of a PHY link as well as system performance optimization in respective of power, reliability and adaptability in high-speed communication systems. The PHY is mainly composed of clock data recovery (CDR), equalizers (EQs) and high- speed I/O drivers. Symmetrical structure of the PHY link is usually duplicated in both link devices for bidirectional data transmission. By introducing training mechanisms into high-speed communication systems, the timing in one link device is adaptively aligned to the timing condition specified in the other link device despite of different skews or induced jitter resulting from process, voltage and temperature (PVT) variations in the individual link. With reliable timing relationships among the interface signals provided, the total system bandwidth is dramatically improved. On the other hand, interface training offers high flexibility for reuse without further investigation on high demanding components involved in high costs. In the training mode, a CDR module is essential for reconstructing the transmitted bitstream to achieve the best data eye and to detect the edges of data stream in asynchronous systems or source-synchronous systems. Generally, the CDR works as a feedback control system that aligns its output clock to the center of the received data. In systems that contain multiple data links, the overall CDR power consumption increases linearly with the increase in number of links as one CDR is required for each link. Therefore, a power-efficient CDR plays a significant role in such systems with parallel links. Furthermore, a high performance CDR requires low jitter generation in spite of high input jitter. To minimize the trade-off between power consumption and CDR jitter, a novel CDR architecture is proposed by utilizing the proportional-integral (PI) controller and three times sampling scheme. Meanwhile, signal integrity (SI) becomes critical as the data rate exceeds several gigabits per second. Distorted data due to the non-idealties in systems are likely to reduce the signal quality aggressively and result in intolerable transmission errors in worst case scenarios, thus affect the system effective bandwidth. Hence, additional trainings such as transmitter (Tx) and receiver (Rx) EQ trainings for SI purpose are inserted into the interface training. Besides, a simplified system architecture with unsymmetrical placement of adaptive Rx and Tx EQs in a single link device is proposed and analyzed by using different coefficient adaptation algorithms. This architecture enables to reduce a large number of EQs through the training, especially in case of parallel links. Meanwhile, considerable power and chip area are saved. Finally, high-speed I/O driver against PVT variations is discussed. Critical issues such as overshoot and undershoot interfering with the data are primarily accompanied by impedance mismatch between the I/O driver and its transmitting channel. By applying PVT compensation technique I/O driver impedances can be effectively calibrated close to the target value. Different digital impedance calibration algorithms against PVT variations are implemented and compared for achieving fast calibration and low power requirements

TUbiblio

tuprints

Recommended from our members

The Architecture of a Reusable Built-In Self-Test for Link Training, IO and Memory Defect Detection and Auto Repair on 14nm Intel SOC

Author: Querbach Bruce
Publication venue: 'Oregon State University'
Publication date
Field of study

The complexity of designing and testing today's system on chip (SOC) is increasing due to greater integrated circuit (IC) density and higher IO and memory frequencies. SOCs for the mobile phone and tablet market have the unique challenge of short product development windows, at times less than six months, and low cost board and platform that limits physical access to test access ports (TAP). This dissertation presents the architecture of a reusable built-in self-test (BIST) engine called converged pattern generator and checker (CPGC) that was developed to address the above challenges. It is used in the critical path of millions of x86 SOC for DDR3, DDR4, LP-DDR3, LP-DDR4 IO initialization and link training. The CPGC is also an essential BIST engine for IO and memory defect detection, and in some cases, the automatic repair of detected memory defects. The software and hardware infrastructure that leverages CPU L2/L3 cache to enable cache based testing (CBT) and the parallel execution of the CPGC Intel BIST engine is shown to improve test time 60x to 170x over conventional TAP based testing. In addition, silicon results are presented showing that CPGC enables easy debug of inter symbol interference (ISI) and crosstalk issues in silicon and boards, enables fast IO link training, improves validation time by 3x, and in some instances, reduces SOC and platform power by 5% to 11% through closed loop IO circuit power optimization. This CPGC BIST engine has been developed into a reusable IP solution, which has been successfully designed into at least 11 Intel CPUs and SOCs (32nm-14nm), with seven of these successfully debugged, tested, and launched into the market place. Ultimately has led to over 100 million CPUs being shipped within one quarter using this architecture

ScholarsArchive@OSU

Recommended from our members

Cross-Layer Pathfinding for Off-Chip Interconnects

Author: Srinivas Vaishnav
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects

eScholarship - University of California

High-Performance DRAM System Design Constraints and Considerations

Author: Gross Joseph
Publication venue
Publication date: 01/01/2010
Field of study

The effects of a realistic memory system have not received much attention in recent decades. Often, the memory controller and DRAMs are modeled as a fixed-latency or random-latency system, which leads to simulations that are less accurate. As more cores are added to each die and CPU clock rates continue to outpace memory access times, the gap will only grow wider and simulation results will be less accurate. This thesis proposes to look at the way a memory controller and DRAM system work and attempt to model them accurately in a simulator. It will use a simulated Alpha 21264 processor in conjunction with a full system simulator and memory system simulator. Various SPEC06 benchmarks are used to look at runtimes. The process of mapping a memory location to a physical location, the algorithm for choosing the ordering of commands to be sent to the DRAMs and the method of managing the row buffers are examined in detail. We find that the choice in these algorithms and policies can affect application runtime by up to 200% or more. It is also shown that energy use can vary by up to 300% by changing changing the address mapping policy. These results show that it is important to look at all the available policies to optimize the memory system for the type of workload that a machine will be running. No single policy is best for every application, so it is important to understand the interaction of the application and the memory system to improve performance and reduce the energy consumed

Digital Repository at the University of Maryland