136 research outputs found

    A 0.8 โ€“ 2.4 Gbps Driver With Adjustable De-Emphasis Scheme For Ddr3 Memory Interface

    Get PDF
    The need for greater memory bandwidth to boost the computer system performance has driven system memory evolution to Double Data Rate Synchronous Dynamic Read Access Memory (DDR SDRAM) technologies. Trends to maximize memory bandwidth have caused Inter-Symbol Interference (ISI) become significant which degraded the signal integrity of transmitted data. In this research, a driver architecture with adjustable de-emphasis and impedance control scheme is proposed for high-data rate and high-density DDR3 SDRAM memory system. The proposed driver is implemented using 45 nm CMOS process technology. The designs and implementations of the proposed driver involve the design of driver architecture, data controller, impedance calibration block with reference generator as well as the layout for critical analog circuits i.e. three driver segments for post-layout simulations to ensure the parasitic in layout does not has significant effect on driver performances. The driver has 15 de-emphasis legs that can form 15 de-emphasis voltage levels that capable of reducing ISI-induced jitter at high operating frequency. Moreover, high density DDR3 memory system can deteriorate the far-end eye jitter and eye height that causes difficulties in data sampling and recovery. Thus, the driving impedance of the proposed driver can be programmed between 20, 30 and 40 ฮฉ to compensate the variability of board routing effect in memory system and hence, improving signal integrity

    An Energy-Efficient Reconfigurable Mobile Memory Interface for Computing Systems

    Get PDF
    The critical need for higher power efficiency and bandwidth transceiver design has significantly increased as mobile devices, such as smart phones, laptops, tablets, and ultra-portable personal digital assistants continue to be constructed using heterogeneous intellectual properties such as central processing units (CPUs), graphics processing units (GPUs), digital signal processors, dynamic random-access memories (DRAMs), sensors, and graphics/image processing units and to have enhanced graphic computing and video processing capabilities. However, the current mobile interface technologies which support CPU to memory communication (e.g. baseband-only signaling) have critical limitations, particularly super-linear energy consumption, limited bandwidth, and non-reconfigurable data access. As a consequence, there is a critical need to improve both energy efficiency and bandwidth for future mobile devices.;The primary goal of this study is to design an energy-efficient reconfigurable mobile memory interface for mobile computing systems in order to dramatically enhance the circuit and system bandwidth and power efficiency. The proposed energy efficient mobile memory interface which utilizes an advanced base-band (BB) signaling and a RF-band signaling is capable of simultaneous bi-directional communication and reconfigurable data access. It also increases power efficiency and bandwidth between mobile CPUs and memory subsystems on a single-ended shared transmission line. Moreover, due to multiple data communication on a single-ended shared transmission line, the number of transmission lines between mobile CPU and memories is considerably reduced, resulting in significant technological innovations, (e.g. more compact devices and low cost packaging to mobile communication interface) and establishing the principles and feasibility of technologies for future mobile system applications. The operation and performance of the proposed transceiver are analyzed and its circuit implementation is discussed in details. A chip prototype of the transceiver was implemented in a 65nm CMOS process technology. In the measurement, the transceiver exhibits higher aggregate data throughput and better energy efficiency compared to prior works

    ๋ฉ”๋ชจ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐ ๋ ˆ๋ฒจ ๋‹จ์ผ ์ข…๋‹จ ์†ก์‹ ๊ธฐ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ๊น€์ˆ˜ํ™˜.๋ณธ ์—ฐ๊ตฌ์—์„œ ๋ฉ”๋ชจ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐ ๋ ˆ๋ฒจ ์†ก์‹ ๊ธฐ๊ฐ€ ์ œ์‹œ๋˜์—ˆ๋‹ค. ํ”„๋กœ์„ธ์„œ์™€ ๋ฉ”๋ชจ๋ฆฌ ๊ฐ„์˜ ์„ฑ๋Šฅ ์ฐจ์ด๊ฐ€ ๋งค๋…„ ๊ณ„์† ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ, ๋ฉ”๋ชจ๋ฆฌ๋Š” ์ „์ฒด ์‹œ์Šคํ…œ์˜ ๋ณ‘๋ชฉ์ ์ด ๋˜๊ณ ์žˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์„ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด PAM-4 ๋‹จ์ผ ์ข…๋‹จ ์†ก์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•˜์˜€๊ณ , ๋ฉ€ํ‹ฐ ๋žญํฌ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์œ„ํ•œ duobinary ๋‹จ์ผ ์ข…๋‹จ ์†ก์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ PAM-4 ์†ก์‹ ๊ธฐ์˜ ๋“œ๋ผ์ด๋ฒ„๋Š” ๋†’์€ ์„ ํ˜•์„ฑ๊ณผ ์ž„ํ”ผ๋˜์Šค ์ •ํ•ฉ์„ ๋™์‹œ์— ๋งŒ์กฑํ•œ๋‹ค. ๋˜ํ•œ ์ €ํ•ญ์ด๋‚˜ ์ธ๋•ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•„ ์ž‘์€ ๋ฉด์ ์„ ์ฐจ์ง€ํ•œ๋‹ค. ์ œ์•ˆ๋œ ZQ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜์€ ์„ธ๊ฐœ์˜ ๊ต์ • ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด ์†ก์‹ ๊ธฐ๊ฐ€ ์ •ํ™•ํ•œ ์ž„ํ”ผ๋˜์Šค์™€ ์„ ํ˜•์ ์ธ ์ถœ๋ ฅ์„ ๊ฐ–๊ฒŒ ํ•œ๋‹ค. ํ”„๋กœํ†  ํƒ€์ž…์€ 65nm CMOS ๊ณต์ •์œผ๋กœ ์ œ์ž‘๋˜์—ˆ๊ณ  ์†ก์‹ ๊ธฐ๋Š” 0.0333mm2์˜ ๋ฉด์ ์„ ์ฐจ์ง€ํ•œ๋‹ค. ์ธก์ •๋œ 28Gb/s์—์„œ์˜ eye๋Š” 18.3ps์˜ ๊ธธ์ด์™€ 42.4mV์˜ ๋†’์ด๋ฅผ ๊ฐ–๊ณ , ์—๋„ˆ์ง€ ํšจ์œจ์€ 0.64pJ/bit์ด๋‹ค. ZQ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜๊ณผ ํ•จ๊ป˜ ์ธก์ •๋œ RLM์€ 0.993์ด๋‹ค. ๋ฉ”๋ชจ๋ฆฌ์˜ ์šฉ๋Ÿ‰์„ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด ํ•˜๋‚˜์˜ ํŒจํ‚ค์ง€์— ์—ฌ๋Ÿฌ ๊ฐœ์˜ DRAM ๋‹ค์ด๋ฅผ ์ˆ˜์ง์œผ๋กœ ์Œ“๋Š” ํŒจํ‚ค์ง•์€ ๋ฉ”๋ชจ๋ฆฌ์˜ ์ค‘์•™ ํŒจ๋“œ ๊ตฌ์กฐ์™€ ๊ฒฐํ•ฉ๋˜์–ด ์งง์€ ๋ฐ˜์‚ฌ๋ฅผ ์•ผ๊ธฐํ•˜๋Š” ์Šคํ…์„ ๋งŒ๋“ ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ธฐ์œ„ํ•ด ๋ฐ˜์‚ฌ ๊ธฐ๋ฐ˜ duobinary ์†ก์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค. ์ด ์†ก์‹ ๊ธฐ๋Š” ๋ฐ˜์‚ฌ๋ฅผ ์ด์šฉํ•˜์—ฌ duobinary signaling์„ ํ•œ๋‹ค. 2ํƒญ ๋ฐ˜๋Œ€ ๊ฐ•์กฐ ๊ธฐ์ˆ ๊ณผ ์Šฌ๋ฃจ ๋ ˆ์ดํŠธ ์กฐ์ ˆ ๊ธฐ์ˆ ์ด ์‹ ํ˜ธ ์™„๊ฒฐ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. NRZ eye๊ฐ€ ์—†๋Š” 10Gb/s์—์„œ ์ธก์ •๋œ duobinary eye๋Š” 63.6ps ๊ธธ์ด์™€ 70.8mV์˜ ๋†’์ด๋ฅผ ๊ฐ–๋Š”๋‹ค. ์ธก์ •๋œ ์—๋„ˆ์ง€ ํšจ์œจ์€ 1.38pJ/bit์ด๋‹ค.Multi-level transmitters for memory interfaces have been presented. The performance gap between processor and memory has been increased by 50% every year, making memory to be a bottle neck of the overall system. To increase memory bandwidth, we have proposed a PAM-4 single-ended transmitter. To compensate for the side effect of the multi-rank memory, we have proposed a reflection-based duobinary transmitter. The proposed PAM-4 transmitter has the driver, which simultaneously satisfies impedance matching and high linearity. The driver occupies a small area due to a resistorless and inductorless structure. The proposed ZQ calibration for PAM-4 has three calibration points, which allow the transmitter to have accurate impedance and linear output. The ZQ calibration considers impedance variation of both the driver and the receiver. A prototype has been fabricated in 65nm CMOS process, and the transmitter occupies 0.0333mm2. The measured eye has a width of 18.3ps and a height of 42.4mV at 28Gb/s, and the measured energy efficiency is 0.64pJ/b. The measured RLM with the 3-point ZQ calibration is 0.993. To increase memory density, the stacked die packaging with multiple DRAM die stacked vertically in one package is widely used. However, combined with the center-pad structure, the structure creates stubs that cause short reflections. We have proposed the reflection-based duobinary transmitter to mitigate this problem. The proposed transmitter uses reflection for duobinary signaling. The 2-tap opposite FFE and the slew-rate control are used to increase signal integrity. The measured duobinary eye at 10Gb/s has a width of 63.6ps and a height of 70.8mV while there is no NRZ eye opening. The measured energy efficiency is 1.38pJ/bit.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 8 CHAPTER 2 MUTI-LEVEL SIGNALING 9 2.1 PAM-4 SIGNALING 9 2.2 DESIGN CONSIDERATIONS FOR PAM-4 TRANSMITTER 16 2.2.1 LEVEL SEPARATION MISMATCH RATIO (RLM) 17 2.2.2 IMPEDANCE MATCHING 19 2.2.3 PRIOR ARTS 21 2.3 DUOBINARY SIGNALING 24 CHAPTER 3 HIGH-LINEARITY AND IMPEDANCE-MATCHED PAM-4 TRANSMITTER 30 3.1 OVERALL ARCHITECTURE 31 3.2 SINGLE-ENDED IMPEDANCE-MATCHED PAM-4 DRIVER 33 3.3 3-POINT ZQ CALIBRATION FOR PAM-4 47 CHAPTER 4 REFLECTION-BASED DUOBINARY TRANSMITTER 57 4.1 BIDIRECTIONAL DUAL-RANK MEMORY SYSTEM 58 4.2 CONCEPT OF REFLECTION-BASED DUOBINARY SIGNALING 66 4.3 REFLECTION-BASED DUOBINARY TRANSMITTER 70 4.3.1 OVERALL ARCHITECTURE 70 4.3.2 EQUALIZATION FOR REFLECTION-BASED DUOBINARY SIGNALING 72 4.3.3 2D BINARY-SEGMENTED DRIVER 75 CHAPTER 5 EXPERIMENTAL RESULTS 77 5.1 HIGH-LINEARITY AND IMPEDANCE-MATCHED PAM-4 TRANSMITTER 77 5.2 REFLECTION-BASED DUOBINARY TRANSMITTER 84 CHAPTER 6 92 CONCLUSION 92 BIBLIOGRAPHY 94Docto

    ํŽ„์Šค ๊ธฐ๋ฐ˜ ํ”ผ๋“œ ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ €๋ฅผ ๊ฐ–์ถ˜ ๊ณ ์šฉ๋Ÿ‰ DRAM์„ ์œ„ํ•œ ์ปจํŠธ๋กค๋Ÿฌ PHY ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ๊น€์ˆ˜ํ™˜.A controller PHY for managed DRAM solution, which is a new memory structure to maximize capacity while minimizing refresh power, is presented. Inter-symbol interference is critical in such a high-capacity DRAM interface in which many DRAM chips share a command/address (C/A) channel. A pulse-based feed-forward equalizer (PB-FFE) is introduced to reduce ISI on a C/A channel. The controller PHY supports all the training sequences specified in the DDR4 standard. A glitch-free DCDL is also adopted to perform link training efficiently and to reduce training time. The DQ transmitter adopts quarter-rate architecture to reduce output latency. For the quarter-rate transmitters in DQ, we propose a quadrature error corrector (QEC), in which clock signal phase errors are corrected using two replicas of the 4:1 serializer of the output stage. Pulse shrinking is used to compare and equalize the outputs of these two replica serializers. A controller PHY was fabricated in 55nm CMOS. The PB-FFE increases the timing margin from 0.23UI to 0.29UI at 1067Mbps. At 2133Mbps, the read timing and voltage margins are 0.53UI and 211mV after read training, and the write margins are 0.72UI and 230mV after write training. To validate the QEC effectiveness, a prototype quarter-rate transmitter, including the QEC, was fabricated to another chip in 65nm CMOS. Adopting our QEC, the experimental results show that the output phase errors of the transmitter are reduced to a residual error of 0.8ps, and the output eye width and height are improved by 84% and 61%, respectively, at a data-rate of 12.8Gbps.๋ณธ ์—ฐ๊ตฌ์—์„œ ์šฉ๋Ÿ‰์„ ์ตœ๋Œ€ํ™”ํ•˜๋ฉด์„œ๋„ ๋ฆฌํ”„๋ ˆ์‹œ ์ „๋ ฅ์„ ์ตœ์†Œํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ์ธ ๊ด€๋ฆฌํ˜• DRAM ์†”๋ฃจ์…˜์„ ์œ„ํ•œ ์ปจํŠธ๋กค๋Ÿฌ PHY๋ฅผ ์ œ์‹œํ•˜์˜€๋‹ค. ์ด์™€ ๊ฐ™์€ ๊ณ ์šฉ๋Ÿ‰ DRAM ์ธํ„ฐํŽ˜์ด์Šค์—์„œ๋Š” ๋งŽ์€ DRAM ์นฉ์ด ๋ช…๋ น / ์ฃผ์†Œ (C/A) ์ฑ„๋„์„ ๊ณต์œ ํ•˜๊ณ  ์žˆ์–ด์„œ ์‹ฌ๋ณผ ๊ฐ„ ๊ฐ„์„ญ์ด ๋ฐœ์ƒํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ C/A ์ฑ„๋„์—์„œ์˜ ์‹ฌ๋ณผ ๊ฐ„ ๊ฐ„์„ญ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ํŽ„์Šค ๊ธฐ๋ฐ˜ ํ”ผ๋“œ ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ € (PB-FFE)๋ฅผ ์ฑ„ํƒํ•˜์˜€๋‹ค. ๋˜ํ•œ ๋ณธ ์—ฐ๊ตฌ์˜ ์ปจํŠธ๋กค๋Ÿฌ PHY๋Š” DDR4 ํ‘œ์ค€์— ์ง€์ •๋œ ๋ชจ๋“  ํŠธ๋ ˆ์ด๋‹ ์‹œํ€€์Šค๋ฅผ ์ง€์›ํ•œ๋‹ค. ๋งํฌ ํŠธ๋ ˆ์ด๋‹์„ ํšจ์œจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ  ํŠธ๋ ˆ์ด๋‹ ์‹œ๊ฐ„์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ๊ธ€๋ฆฌ์น˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ๋””์ง€ํ„ธ ์ œ์–ด ์ง€์—ฐ ๋ผ์ธ (DCDL)์„ ์ฑ„ํƒํ•˜์˜€๋‹ค. ์ปจํŠธ๋กค๋Ÿฌ PHY์˜ DQ ์†ก์‹ ๊ธฐ๋Š” ์ถœ๋ ฅ ๋Œ€๊ธฐ ์‹œ๊ฐ„์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ฟผํ„ฐ ๋ ˆ์ดํŠธ ๊ตฌ์กฐ๋ฅผ ์ฑ„ํƒํ•˜์˜€๋‹ค. ์ฟผํ„ฐ ๋ ˆ์ดํŠธ ์†ก์‹ ๊ธฐ์˜ ๊ฒฝ์šฐ์—๋Š” ์ง๊ต ํด๋Ÿญ ๊ฐ„ ์œ„์ƒ ์˜ค๋ฅ˜๊ฐ€ ์ถœ๋ ฅ ์‹ ํ˜ธ์˜ ๋ฌด๊ฒฐ์„ฑ์— ์˜ํ–ฅ์„ ์ฃผ๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ์˜ํ–ฅ์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ถœ๋ ฅ ๋‹จ์˜ 4 : 1 ์ง๋ ฌ ๋ณ€ํ™˜๊ธฐ์˜ ๋‘ ๋ณต์ œ๋ณธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํด๋ก ์‹ ํ˜ธ ์œ„์ƒ ์˜ค๋ฅ˜๋ฅผ ์ˆ˜์ •ํ•˜๋Š” QEC (Quadrature Error Corrector)๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ณต์ œ๋œ 2๊ฐœ์˜ ์ง๋ ฌ ๋ณ€ํ™˜๊ธฐ์˜ ์ถœ๋ ฅ์„ ๋น„๊ตํ•˜๊ณ  ๊ท ๋“ฑํ™”ํ•˜๊ธฐ ์œ„ํ•ด ํŽ„์Šค ์ˆ˜์ถ• ์ง€์—ฐ ๋ผ์ธ์ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ์ปจํŠธ๋กค๋Ÿฌ PHY๋Š” 55nm CMOS ๊ณต์ •์œผ๋กœ ์ œ์กฐ๋˜์—ˆ๋‹ค. PB-FFE๋Š” 1067Mbps์—์„œ C/A ์ฑ„๋„ ํƒ€์ด๋ฐ ๋งˆ์ง„์„ 0.23UI์—์„œ 0.29UI๋กœ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ์ฝ๊ธฐ ํŠธ๋ ˆ์ด๋‹ ํ›„ ์ฝ๊ธฐ ํƒ€์ด๋ฐ ๋ฐ ์ „์•• ๋งˆ์ง„์€ 2133Mbps์—์„œ 0.53UI ๋ฐ 211mV์ด๊ณ , ์“ฐ๊ธฐ ํŠธ๋ ˆ์ด๋‹ ํ›„ ์“ฐ๊ธฐ ๋งˆ์ง„์€ 0.72UI ๋ฐ 230mV์ด๋‹ค. QEC์˜ ํšจ๊ณผ๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด QEC๋ฅผ ํฌํ•จํ•œ ํ”„๋กœํ†  ํƒ€์ž… ์ฟผํ„ฐ ๋ ˆ์ดํŠธ ์†ก์‹ ๊ธฐ๋ฅผ 65nm CMOS์˜ ๋‹ค๋ฅธ ์นฉ์œผ๋กœ ์ œ์ž‘ํ•˜์˜€๋‹ค. QEC๋ฅผ ์ ์šฉํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ, ์†ก์‹ ๊ธฐ์˜ ์ถœ๋ ฅ ์œ„์ƒ ์˜ค๋ฅ˜๊ฐ€ 0.8ps์˜ ์ž”๋ฅ˜ ์˜ค๋ฅ˜๋กœ ๊ฐ์†Œํ•˜๊ณ , ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ๋ˆˆ์˜ ํญ๊ณผ ๋†’์ด๊ฐ€ 12.8Gbps์˜ ๋ฐ์ดํ„ฐ ์†๋„์—์„œ ๊ฐ๊ฐ 84 %์™€ 61 % ๊ฐœ์„ ๋˜์—ˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.1.1 HEAVY LOAD C/A CHANNEL 5 1.1.2 QUARTER-RATE ARCHITECTURE IN DQ TRANSMITTER 7 1.1.3 SUMMARY 8 1.2 THESIS ORGANIZATION 10 CHAPTER 2 ARCHITECTURE 11 2.1 MDS DIMM STRUCTURE 11 2.2 MDS CONTROLLER 15 2.3 MDS CONTROLLER PHY 17 2.3.1 INITIALIZATION SEQUENCE 20 2.3.2 LINK TRAINING FINITE-STATE MACHINE 23 2.3.3 POWER DOWN MODE 28 CHAPTER 3 PULSE-BASED FEED-FORWARD EQUALIZER 29 3.1 COMMAND/ADDRESS CHANNEL 29 3.2 COMMAND/ADDRESS TRANSMITTER 33 3.3 PULSE-BASED FEED-FORWARD EQUALIZER 35 CHAPTER 4 CIRCUIT IMPLEMENTATION 39 4.1 BUILDING BLOCKS 39 4.1.1 ALL-DIGITAL PHASE-LOCKED LOOP (ADPLL) 39 4.1.2 ALL-DIGITAL DELAY-LOCKED LOOP (ADDLL) 44 4.1.3 GLITCH-FREE DCDL CONTROL 47 4.1.4 DUTY-CYCLE CORRECTOR (DCC) 50 4.1.5 DQ/DQS TRANSMITTER 52 4.1.6 DQ/DQS RECEIVER 54 4.1.7 ZQ CALIBRATION 56 4.2 MODELING AND VERIFICATION OF LINK TRAINING 59 4.3 BUILT-IN SELF-TEST CIRCUITS 66 CHAPTER 5 QUADRATURE ERROR CORRECTOR USING REPLICA SERIALIZERS AND PULSE-SHRINKING DELAY LINES 69 5.1 PHASE CORRECTION USING REPLICA SERIALIZERS AND PULSE-SHRINKING UNITS 69 5.2 OVERALL QEC ARCHITECTURE AND ITS OPERATION 71 5.3 FINE DELAY UNIT IN THE PSDL 76 CHAPTER 6 EXPERIMENTAL RESULTS 78 6.1 CONTROLLER PHY 78 6.2 PROTOTYPE QEC 88 CHAPTER 7 CONCLUSION 94 BIBLIOGRAPHY 96Docto

    ์ ์‘ํ˜• ๋ˆˆ ๊ฐ์ง€ ๋ฐฉ๋ฒ•์„ ํฌํ•จํ•œ ์ €์ „๋ ฅ ๋ฉ”๋ชจ๋ฆฌ ์ปจํŠธ๋กค๋Ÿฌ์˜ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 8. ๊น€์ˆ˜ํ™˜.and the read margin was enhanced from 0.30UI and 76mV without AF-CTLE to 0.47UI and 80mV to with AF-CTLE. The power efficiency during burst write and read were 5.68pJ/bit and 1.83pJ/bit respectively.A 4266Mb/s/pin LPDDR4 memory controller with an asynchronous feedback continuous-time linear equalizer and an adaptive 3-step eye detection algorithm is presented. The asynchronous feedback continuous-time linear equalizer removes the glitch of DQS without training by applying an offset larger than the noise, and improves read margin by operating as a decision feedback equalizer in DQ path. The adaptive 3-step eye detection algorithm reduces power consumption and black-out time in initialization sequence and retraining in comparison to the 2-dimensional full scanning. In addition, the adaptive 3-step eye detection algorithm can maintain the accuracy by sequentially searching the eye boundaries and initializing the resolution using the binary search method when the eye detection result changes. To achieve high bandwidth, a transmitter and receiver suitable for training are proposed. The transmitter consists of a phase interpolator, a digitally-controlled delay line, a 16:1 serializer, a pre-driver and low-voltage swing terminated logic. The receiver consists of a reference voltage generator, a continuous-time linear equalizer, a phase interpolator, a digitally-controlled delay line, a 1:4 deserializer, and a 4:16 deserializer. The clocking architecture is also designed for low power consumption in idle periods, which are commonly lengthy in mobile applications. A prototype chip was implemented in a 65nm CMOS process with ball grid array package and tested with commodity LPDDR4. The write margin was 0.36UI and 148mVCHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 5 CHAPTER 2 LPDDR4 6 2.1 COMPARISON BETWEEN LPDDR3 AND LPDDR4 6 2.2 SOURCE SYNCHRONOUS CLOCKING SCHEME 9 2.3 SIGNALING STANDARDS 11 2.4 MULTIPLE TRAININGS 14 2.5 RE-TRAINING AND RE-INITIALIZATION 16 CHAPTER 3 ADAPTIVE EYE DETECTION 18 3.1 EYE DETECTION 18 3.2 1X2Y3X EYE DETECTION 20 3.3 ADAPTIVE GAIN CONTROL 22 3.4 ADAPTIVE 1X2Y3X EYE DETECTION 24 CHAPTER 4 LPDDR4 MEMORY CONTROLLER 26 4.1 DESIGN PROCEDURE 26 4.2 ARCHITECTURE 30 4.2.1 TRANSMITTER 33 4.2.2 RECEIVER 35 4.2.3 CLOCKING ARCHITECTURE 38 4.3 CIRCUIT IMPLEMENTATION 43 4.3.1 ADPLL WITH MULTI-MODULUS DIVIDER 43 4.3.2 ADDLL WITH TRIANGULAR-MODULATED PI 45 4.3.3 CTLE WITH AUTO-DQS CLEANING 47 4.3.4 DES WITH CLOCK DOMAIN CROSSING 52 4.3.5 LVSTL WITH ZQ CALIBRATION 54 4.3.6 COARSE-FINE DCDL 56 4.4 LINK TRAINING 57 4.4.1 SIMULATION RESULTS 59 CHAPTER 5 MEASUREMENT RESULTS 72 5.1 MEASUREMENT SETUP 72 5.2 MEASUREMENT RESULTS OF SUB-BLOCK 80 5.2.1 ADPLL WITH MULTI-MODULUS DIVIDER 80 5.2.2 ADDLL WITH TRIANGULAR-MODULATED PI 82 5.2.3 COARSE-FINE DCDL 84 5.3 LPDDR4 INTERFACE MEASUREMENT RESULTS 84 CHAPTER 6 CONCLUSION 88 BIBLIOGRAPHY 90Docto

    Thin-film Block Copolymers (BCPs) Self-assembly as Versatile Patterning Scheme for Functional Nanomaterials

    Get PDF
    Nanopattern generation is required for building various structural entities in every production process that involves nanostructures. Advancing nanopatterning technologies play an important role in developing and broadening the current nanopatterning technologies to meet up with the ever-demanding requirements in the realm of smaller feature sizes, smoother line-edge roughness (LER) and facile pattern transfer in pursuit of faster computer processors, better electrocatalysts and more compact and intelligent sensors, etc. Conventionally, patterning needs are heavily relied on photolithography, a technique that dominate chip-making industry for more than 50 years. However conventional photolithography is bounded by inherent resolution limits and difficult to be applied on non-flat, flexible, or stretchable substrates. Advancement in patterning techniques are urgently needed to enhance the capability for sub-10 nm patterning onto versatile substrates. The patterning techniques adopted in this work is a bottom-up self-assembly driven scheme based on the phase segregation of block copolymers (BCPs). Cleverly designed BCPs system can generate self-assembled pattern to give a sub-10 nm pitch, demonstrating the tremendous potential of novel BCP chemistries in generating sub-10 nm features. Recent excellent works on BCPs with sub-10 nm natural periods are timely reviewed, and key principles in designing next generation BCP candidates for extreme scale lithography are proposed in the outlook. Thin film BCPs templates were leveraged to generate patterns on various substrates including silicon, gold, glassy carbon, reduced graphene oxide, Nafion ยฎ membrane, and perfluorinated anion exchange membrane. The profound meaning of these demonstration is twofold, firstly showcased the robustness and wide portability of the tested BCP patterning scheme, secondly demonstration of introducing the BCP templates onto smart substrates that have special functionality and wide implications. Further ionization and metallization of BCPs templates exemplify the feasibility of fabricating nanostructured electrolytes and metal nanowires with controlled periodic features sizes. Ordered nanostructures with designed ionic loadings, metal densities on functional substrates open up tremendous possibility to be incorporated into sensor, nanoseparators and nanoreactors with novel properties that yet to be uncovered

    PHY Link Design and Optimization For High-Speed Low-Power Communication Systems

    Get PDF
    The ever-growing demands for high-bandwidth data transfer have been pushing towards advancing research efforts in the field of high-performing communication systems. Studies on the performance of single chip, e.g. faster multi-core processors and higher system memory capacity, have been explored. To further enhance the system performance, researches have been focused on the improvement of data-transfer bandwidth for chip-to-chip communication in the high-speed serial link. Many solutions have been addressed to overcome the bottleneck caused by the non-idealties such as bandwidth-limited electrical channel that connects two link devices and varieties of undesired noise in the communication systems. Nevertheless, with these solutions data have run into limitations of the timing margins for high-speed interfaces running at multiple gigabits per second data rates on low-cost Printed Circuit Board (PCB) material with constrained power budget. Therefore, the challenge in designing a physical layer (PHY) link for high-speed communication systems turns out to be power-efficient, reliable and cost-effective. In this context, this dissertation is intended to focus on architectural design, system-level and circuit-level verification of a PHY link as well as system performance optimization in respective of power, reliability and adaptability in high-speed communication systems. The PHY is mainly composed of clock data recovery (CDR), equalizers (EQs) and high- speed I/O drivers. Symmetrical structure of the PHY link is usually duplicated in both link devices for bidirectional data transmission. By introducing training mechanisms into high-speed communication systems, the timing in one link device is adaptively aligned to the timing condition specified in the other link device despite of different skews or induced jitter resulting from process, voltage and temperature (PVT) variations in the individual link. With reliable timing relationships among the interface signals provided, the total system bandwidth is dramatically improved. On the other hand, interface training offers high flexibility for reuse without further investigation on high demanding components involved in high costs. In the training mode, a CDR module is essential for reconstructing the transmitted bitstream to achieve the best data eye and to detect the edges of data stream in asynchronous systems or source-synchronous systems. Generally, the CDR works as a feedback control system that aligns its output clock to the center of the received data. In systems that contain multiple data links, the overall CDR power consumption increases linearly with the increase in number of links as one CDR is required for each link. Therefore, a power-efficient CDR plays a significant role in such systems with parallel links. Furthermore, a high performance CDR requires low jitter generation in spite of high input jitter. To minimize the trade-off between power consumption and CDR jitter, a novel CDR architecture is proposed by utilizing the proportional-integral (PI) controller and three times sampling scheme. Meanwhile, signal integrity (SI) becomes critical as the data rate exceeds several gigabits per second. Distorted data due to the non-idealties in systems are likely to reduce the signal quality aggressively and result in intolerable transmission errors in worst case scenarios, thus affect the system effective bandwidth. Hence, additional trainings such as transmitter (Tx) and receiver (Rx) EQ trainings for SI purpose are inserted into the interface training. Besides, a simplified system architecture with unsymmetrical placement of adaptive Rx and Tx EQs in a single link device is proposed and analyzed by using different coefficient adaptation algorithms. This architecture enables to reduce a large number of EQs through the training, especially in case of parallel links. Meanwhile, considerable power and chip area are saved. Finally, high-speed I/O driver against PVT variations is discussed. Critical issues such as overshoot and undershoot interfering with the data are primarily accompanied by impedance mismatch between the I/O driver and its transmitting channel. By applying PVT compensation technique I/O driver impedances can be effectively calibrated close to the target value. Different digital impedance calibration algorithms against PVT variations are implemented and compared for achieving fast calibration and low power requirements

    High-Performance DRAM System Design Constraints and Considerations

    Get PDF
    The effects of a realistic memory system have not received much attention in recent decades. Often, the memory controller and DRAMs are modeled as a fixed-latency or random-latency system, which leads to simulations that are less accurate. As more cores are added to each die and CPU clock rates continue to outpace memory access times, the gap will only grow wider and simulation results will be less accurate. This thesis proposes to look at the way a memory controller and DRAM system work and attempt to model them accurately in a simulator. It will use a simulated Alpha 21264 processor in conjunction with a full system simulator and memory system simulator. Various SPEC06 benchmarks are used to look at runtimes. The process of mapping a memory location to a physical location, the algorithm for choosing the ordering of commands to be sent to the DRAMs and the method of managing the row buffers are examined in detail. We find that the choice in these algorithms and policies can affect application runtime by up to 200% or more. It is also shown that energy use can vary by up to 300% by changing changing the address mapping policy. These results show that it is important to look at all the available policies to optimize the memory system for the type of workload that a machine will be running. No single policy is best for every application, so it is important to understand the interaction of the application and the memory system to improve performance and reduce the energy consumed
    • โ€ฆ
    corecore