41 research outputs found

    ๋ฉ”๋ชจ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์œ„ํ•œ 4 ๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์ฟผํ„ฐ ๋ ˆ์ดํŠธ ์ˆ˜์‹ ๊ธฐ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022. 8. ๊น€์ˆ˜ํ™˜.๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋ฉ”๋ชจ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์œ„ํ•œ 4 ๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ (PAM-4) ์ˆ˜์‹ ๊ธฐ์™€ ์ง๊ต ํด๋ก์„ ์ƒ์„ฑํ•˜๋Š” ์ง๊ต ์‹ ํ˜ธ ๋ณด์ •๊ธฐ๋ฅผ ์ œ์•ˆ๋œ๋‹ค. ๋ฐ์ดํ„ฐ ์„ผํ„ฐ์—์„œ ์ฆ๊ฐ€ํ•˜๋Š” IP ํŠธ๋ž˜ํ”ฝ์€ ๊ณ ์† ๋ฐ ์ €์ „๋ ฅ ๋ฉ”๋ชจ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค์— ๋Œ€ํ•œ ์ˆ˜์š”๋ฅผ ์ฆ๊ฐ€์‹œ์ผœ์™”๋‹ค. ์ด๋Ÿฌํ•œ ์š”๊ตฌ๋ฅผ ๋งŒ์กฑ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ํด๋Ÿญ ๋ฐ ๋‚˜์ดํ€ด์ŠคํŠธ ์ฃผํŒŒ์ˆ˜๋ฅผ ๋†’์ด์ง€ ์•Š๊ณ ๋„ ๋ฐ์ดํ„ฐ ์ „์†ก๋ฅ ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋Š” PAM-4 ์‹ ํ˜ธ๊ฐ€ ์ฃผ๋ชฉ์„ ๋ฐ›๊ณ  ์žˆ๋‹ค. PAM-4 ์‹ ํ˜ธ๋Š” ์ œ๋กœ ๋น„ ๋ณต๊ท€ ์‹ ํ˜ธ (NRZ) ๋ณด๋‹ค 3๋ฐฐ ๋‚ฎ์€ ์ˆ˜์ง ๋งˆ์ง„์„ ๊ฐ€์ง€๋ฉฐ, ์ด๋Š” ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ์ดํ€„๋ผ์ด์ € ๋‚ด ์Šฌ๋ผ์ด์Šค์˜ ํด๋Ÿญ-ํ ๋”œ๋ ˆ์ด๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋ฉฐ, ์ด๋กœ ์ธํ•ด PAM-4 ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ์ดํ€„๋ผ์ด์ €์˜ ์„ฑ๋Šฅ์„ ์ œํ•œํ•˜๋Š” ์š”์ธ์ด๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ธ๋ฒ„ํ„ฐ ๊ธฐ๋ฐ˜์˜ ํ•ฉ์‚ฐ๊ธฐ๋ฅผ ์ด์šฉ, ์„ ํƒ์ ์œผ๋กœ ์‹ ํ˜ธ๋ฅผ ์ฆํญ์‹œํ‚ค๋Š” ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ์ดํ€„๋ผ์ด์ €๋ฅผ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ์Šฌ๋ผ์ด์„œ์˜ ์ „๋ ฅ ์†Œ๋ชจ๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค์ง€ ์•Š์œผ๋ฉด์„œ ์Šฌ๋ผ์ด์„œ์˜ ํด๋Ÿญ-ํ ๋”œ๋ ˆ์ด๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ, ์ ์‘ํ˜• ์ง€์—ฐ ์ด๋“ ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ํฌํ•จํ•˜๋Š” ์ง๊ต ์‹ ํ˜ธ ๋ณด์ •๊ธฐ๋Š” ๋†’์€ ์ •ํ™•๋„์™€ ๋น ๋ฅธ ์Šคํ ๋ณด์ •์œผ๋กœ ์ฟผ๋“œ๋Ÿฌ์ฒ˜ ํด๋Ÿญ ๊ฐ„์˜ ์Šคํ๋ฅผ ๊ต์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ์„ ํƒ์  ๋ˆˆ ์ฆํญ ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ์ดํ€„๋ผ์ด์ €์™€ ์ ์‘ํ˜• ์ง€์—ฐ ์ด๋“ ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ํฌํ•จํ•˜๋Š” ์ง๊ต ์‹ ํ˜ธ ๋ณด์ •๊ธฐ์˜ ์„ฑ๋Šฅ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ํ”„๋กœํ† ํƒ€์ž… ์นฉ์„ ์ œ์ž‘ํ•˜์˜€๋‹ค. ์ œ์ž‘๋œ ์นฉ์€ 65 nm CMOS ๊ณต์ •์œผ๋กœ ์ œ์ž‘๋˜์—ˆ๋‹ค. ํ”„๋กœํ† ํƒ€์ž… ์นฉ์€ 24 Gb/s/pin ์—์„œ 10-12 ์˜ ๋น„ํŠธ ์—๋Ÿฌ์œจ์„ 100 mUI ์˜ ์‹ ํ˜ธ ๋„ˆ๋น„๋กœ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ํ”„๋กœํ† ํƒ€์ž… ์นฉ ๋‚ด PAM-4 ์ˆ˜์‹ ๊ธฐ๋Š” 0.73 pJ/b ์˜ ์—๋„ˆ์ง€ ํšจ์œจ์„ ๊ฐ–๋Š”๋‹ค. ๋˜ํ•œ ์ ์‘ํ˜• ์ง€์—ฐ ์ด๋“ ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ํฌํ•จํ•˜๋Š” ์ง๊ต ์‹ ํ˜ธ ๋ณด์ •๊ธฐ๋Š” 3 GHz ์ฟผ๋“œ๋Ÿฌ์ฒ˜ ํด๋Ÿญ ๊ฐ„ ์ตœ๋Œ€ 21.2 ps ์˜ ์Šคํ๋ฅผ 0.8 ps ๊นŒ์ง€ ์ค„์ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด ๋•Œ 76.9 ns ์˜ ๊ต์ • ์‹œ๊ฐ„์„ ๊ฐ–๋Š”๋‹ค. ์ œ์•ˆํ•˜๋Š” ์ง๊ต ์‹ ํ˜ธ ๋ณด์ •๊ธฐ๋Š” 3 GHz ์—์„œ 2.15 mW/GHz ์˜ ์ „๋ ฅ ํšจ์œจ์„ ๊ฐ–๋Š”๋‹ค.A four-level pulse amplitude modulation (PAM-4) receiver, and a quadrature signal corrector (QSC) that generates quadrature clocks for memory interfaces is presented. Increasing IP traffic in data centers has increased the demand for high-speed and low-power memory interfaces. To satisfy this demand, PAM-4 signaling, which can increase data-rate without increasing clock and Nyquist frequency, is received considerable attention. PAM- signaling has vertical which three times lower than non-return-to-zero (NRZ) signaling, which makes the clock-to-Q delay of the slicer in the decision feedback equalizer (DFE) increases. This makes the DFE difficult to satisfy the timing constraint. In this paper, by using a DFE with inverter-based summers, the clock-to-Q delay of the slicer can be reduced without increasing the power consumption of the slicers. Also, the QSC using an adaptive delay gain controller can correct the skew between the quadrature clock with low skew and short correction time. The prototype receiver including the DFE with the inverter-based summer and the QSC using the adaptive delay gain controller was fabricated in 65 nm CMOS process. The prototype chip can achieve a bit error rate (BER) of 10-12 at 24 Gb/s/pin, and at this time, an eye width of 100 mUI is secured. The efficiency of the receiver is 0.73 pJ/b. In addition, the QSC cna reduce the maximum 21.2 ps of skew between 3 GHz quadrature clocks to 0.8 ps and has a correction time of 76.9 ns. The efficiency of the QSC is 2.15 mW/GHz.ABSTRACT 1 CONTENTS 3 LIST OF FIGURES 5 LIST OF TABLE 9 CHAPTER 1 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 PAM-4 SIGNALING 7 1.2.1 DESIGN CONSIDERATIONS ON PAM-4 RECEIVER 10 1.2.2 PRIOR WORKS 14 1.3 QUARTER-RATE ARCHITECTURE 18 1.3.1 DESIGN CONSIDERATION ON QUARTER-RATE ARCHITECTURE 20 1.3.2 PRIOR WORKS 25 1.4 SUMMARY 28 1.5 THESIS ORGANIZATION 30 CHAPTER 2 31 CONCEPTS OF DFE WITH INVERTER-BASED SUMMER 31 2.1 CONCEPTUAL ARCHITECTURE OF DFE WITH INVERTER-BASED SUMMER 32 2.2 DESIGN CONSIDERATION OF INVERTER-BASED SUMMER 37 CHAPTER 3 41 CONCEPTS OF QUADRATURE SIGNAL CORRECTOR USING ADAPTIVE DELAY GAIN CONTROLLER 41 3.1 OPERATION OF PROPOSED QUADRATURE SIGNAL CORRECTOR 42 3.2 LOOP FILTER INCLUDING ADAPTIVE DELAY GAIN CONTROLLER 45 CHAPTER 4 48 ARCHITECTURE AND IMPLEMENTATION 48 4.1 OVERALL ARCHITECTURE 49 4.2 ANALOG FRONT END 52 4.3 DECISION FEEDBACK EQUALIZER WITH INVERTER-BASED SUMMER 54 4.4 CLOCK PATH 62 4.5 QUADRATURE SIGNAL CORRECTOR WITH ADAPTIVE DELAY GAIN CONTROLLER 63 CHAPTER 5 70 EXPERIMENTAL RESULTS 70 5.1 EXPERIMENTAL SETUP 70 5.2 EXPERIMENTAL RESULTS 74 5.2.1 MEASUREMENT RESULTS OF PAM-4 RECEIVER WITH DECISION FEEDBACK EQUALIZER USING INVERTER-BASED SUMMER 74 5.2.2 MEASUREMENT RESULTS OF QUADRATURE SIGNAL CORRECTOR USING ADAPTIVE DELAY GAIN CONTROLLER 77 CHAPTER 6 83 CONCLUSION 83 BIBLIOGRAPHY 86๋ฐ•

    ํŽ„์Šค ๊ธฐ๋ฐ˜ ํ”ผ๋“œ ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ €๋ฅผ ๊ฐ–์ถ˜ ๊ณ ์šฉ๋Ÿ‰ DRAM์„ ์œ„ํ•œ ์ปจํŠธ๋กค๋Ÿฌ PHY ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ๊น€์ˆ˜ํ™˜.A controller PHY for managed DRAM solution, which is a new memory structure to maximize capacity while minimizing refresh power, is presented. Inter-symbol interference is critical in such a high-capacity DRAM interface in which many DRAM chips share a command/address (C/A) channel. A pulse-based feed-forward equalizer (PB-FFE) is introduced to reduce ISI on a C/A channel. The controller PHY supports all the training sequences specified in the DDR4 standard. A glitch-free DCDL is also adopted to perform link training efficiently and to reduce training time. The DQ transmitter adopts quarter-rate architecture to reduce output latency. For the quarter-rate transmitters in DQ, we propose a quadrature error corrector (QEC), in which clock signal phase errors are corrected using two replicas of the 4:1 serializer of the output stage. Pulse shrinking is used to compare and equalize the outputs of these two replica serializers. A controller PHY was fabricated in 55nm CMOS. The PB-FFE increases the timing margin from 0.23UI to 0.29UI at 1067Mbps. At 2133Mbps, the read timing and voltage margins are 0.53UI and 211mV after read training, and the write margins are 0.72UI and 230mV after write training. To validate the QEC effectiveness, a prototype quarter-rate transmitter, including the QEC, was fabricated to another chip in 65nm CMOS. Adopting our QEC, the experimental results show that the output phase errors of the transmitter are reduced to a residual error of 0.8ps, and the output eye width and height are improved by 84% and 61%, respectively, at a data-rate of 12.8Gbps.๋ณธ ์—ฐ๊ตฌ์—์„œ ์šฉ๋Ÿ‰์„ ์ตœ๋Œ€ํ™”ํ•˜๋ฉด์„œ๋„ ๋ฆฌํ”„๋ ˆ์‹œ ์ „๋ ฅ์„ ์ตœ์†Œํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ๋ฉ”๋ชจ๋ฆฌ ๊ตฌ์กฐ์ธ ๊ด€๋ฆฌํ˜• DRAM ์†”๋ฃจ์…˜์„ ์œ„ํ•œ ์ปจํŠธ๋กค๋Ÿฌ PHY๋ฅผ ์ œ์‹œํ•˜์˜€๋‹ค. ์ด์™€ ๊ฐ™์€ ๊ณ ์šฉ๋Ÿ‰ DRAM ์ธํ„ฐํŽ˜์ด์Šค์—์„œ๋Š” ๋งŽ์€ DRAM ์นฉ์ด ๋ช…๋ น / ์ฃผ์†Œ (C/A) ์ฑ„๋„์„ ๊ณต์œ ํ•˜๊ณ  ์žˆ์–ด์„œ ์‹ฌ๋ณผ ๊ฐ„ ๊ฐ„์„ญ์ด ๋ฐœ์ƒํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ C/A ์ฑ„๋„์—์„œ์˜ ์‹ฌ๋ณผ ๊ฐ„ ๊ฐ„์„ญ์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ํŽ„์Šค ๊ธฐ๋ฐ˜ ํ”ผ๋“œ ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ € (PB-FFE)๋ฅผ ์ฑ„ํƒํ•˜์˜€๋‹ค. ๋˜ํ•œ ๋ณธ ์—ฐ๊ตฌ์˜ ์ปจํŠธ๋กค๋Ÿฌ PHY๋Š” DDR4 ํ‘œ์ค€์— ์ง€์ •๋œ ๋ชจ๋“  ํŠธ๋ ˆ์ด๋‹ ์‹œํ€€์Šค๋ฅผ ์ง€์›ํ•œ๋‹ค. ๋งํฌ ํŠธ๋ ˆ์ด๋‹์„ ํšจ์œจ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ  ํŠธ๋ ˆ์ด๋‹ ์‹œ๊ฐ„์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ๊ธ€๋ฆฌ์น˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ๋””์ง€ํ„ธ ์ œ์–ด ์ง€์—ฐ ๋ผ์ธ (DCDL)์„ ์ฑ„ํƒํ•˜์˜€๋‹ค. ์ปจํŠธ๋กค๋Ÿฌ PHY์˜ DQ ์†ก์‹ ๊ธฐ๋Š” ์ถœ๋ ฅ ๋Œ€๊ธฐ ์‹œ๊ฐ„์„ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ฟผํ„ฐ ๋ ˆ์ดํŠธ ๊ตฌ์กฐ๋ฅผ ์ฑ„ํƒํ•˜์˜€๋‹ค. ์ฟผํ„ฐ ๋ ˆ์ดํŠธ ์†ก์‹ ๊ธฐ์˜ ๊ฒฝ์šฐ์—๋Š” ์ง๊ต ํด๋Ÿญ ๊ฐ„ ์œ„์ƒ ์˜ค๋ฅ˜๊ฐ€ ์ถœ๋ ฅ ์‹ ํ˜ธ์˜ ๋ฌด๊ฒฐ์„ฑ์— ์˜ํ–ฅ์„ ์ฃผ๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ์˜ํ–ฅ์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ถœ๋ ฅ ๋‹จ์˜ 4 : 1 ์ง๋ ฌ ๋ณ€ํ™˜๊ธฐ์˜ ๋‘ ๋ณต์ œ๋ณธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํด๋ก ์‹ ํ˜ธ ์œ„์ƒ ์˜ค๋ฅ˜๋ฅผ ์ˆ˜์ •ํ•˜๋Š” QEC (Quadrature Error Corrector)๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ณต์ œ๋œ 2๊ฐœ์˜ ์ง๋ ฌ ๋ณ€ํ™˜๊ธฐ์˜ ์ถœ๋ ฅ์„ ๋น„๊ตํ•˜๊ณ  ๊ท ๋“ฑํ™”ํ•˜๊ธฐ ์œ„ํ•ด ํŽ„์Šค ์ˆ˜์ถ• ์ง€์—ฐ ๋ผ์ธ์ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ์ปจํŠธ๋กค๋Ÿฌ PHY๋Š” 55nm CMOS ๊ณต์ •์œผ๋กœ ์ œ์กฐ๋˜์—ˆ๋‹ค. PB-FFE๋Š” 1067Mbps์—์„œ C/A ์ฑ„๋„ ํƒ€์ด๋ฐ ๋งˆ์ง„์„ 0.23UI์—์„œ 0.29UI๋กœ ์ฆ๊ฐ€์‹œํ‚จ๋‹ค. ์ฝ๊ธฐ ํŠธ๋ ˆ์ด๋‹ ํ›„ ์ฝ๊ธฐ ํƒ€์ด๋ฐ ๋ฐ ์ „์•• ๋งˆ์ง„์€ 2133Mbps์—์„œ 0.53UI ๋ฐ 211mV์ด๊ณ , ์“ฐ๊ธฐ ํŠธ๋ ˆ์ด๋‹ ํ›„ ์“ฐ๊ธฐ ๋งˆ์ง„์€ 0.72UI ๋ฐ 230mV์ด๋‹ค. QEC์˜ ํšจ๊ณผ๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด QEC๋ฅผ ํฌํ•จํ•œ ํ”„๋กœํ†  ํƒ€์ž… ์ฟผํ„ฐ ๋ ˆ์ดํŠธ ์†ก์‹ ๊ธฐ๋ฅผ 65nm CMOS์˜ ๋‹ค๋ฅธ ์นฉ์œผ๋กœ ์ œ์ž‘ํ•˜์˜€๋‹ค. QEC๋ฅผ ์ ์šฉํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ, ์†ก์‹ ๊ธฐ์˜ ์ถœ๋ ฅ ์œ„์ƒ ์˜ค๋ฅ˜๊ฐ€ 0.8ps์˜ ์ž”๋ฅ˜ ์˜ค๋ฅ˜๋กœ ๊ฐ์†Œํ•˜๊ณ , ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ๋ˆˆ์˜ ํญ๊ณผ ๋†’์ด๊ฐ€ 12.8Gbps์˜ ๋ฐ์ดํ„ฐ ์†๋„์—์„œ ๊ฐ๊ฐ 84 %์™€ 61 % ๊ฐœ์„ ๋˜์—ˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.1.1 HEAVY LOAD C/A CHANNEL 5 1.1.2 QUARTER-RATE ARCHITECTURE IN DQ TRANSMITTER 7 1.1.3 SUMMARY 8 1.2 THESIS ORGANIZATION 10 CHAPTER 2 ARCHITECTURE 11 2.1 MDS DIMM STRUCTURE 11 2.2 MDS CONTROLLER 15 2.3 MDS CONTROLLER PHY 17 2.3.1 INITIALIZATION SEQUENCE 20 2.3.2 LINK TRAINING FINITE-STATE MACHINE 23 2.3.3 POWER DOWN MODE 28 CHAPTER 3 PULSE-BASED FEED-FORWARD EQUALIZER 29 3.1 COMMAND/ADDRESS CHANNEL 29 3.2 COMMAND/ADDRESS TRANSMITTER 33 3.3 PULSE-BASED FEED-FORWARD EQUALIZER 35 CHAPTER 4 CIRCUIT IMPLEMENTATION 39 4.1 BUILDING BLOCKS 39 4.1.1 ALL-DIGITAL PHASE-LOCKED LOOP (ADPLL) 39 4.1.2 ALL-DIGITAL DELAY-LOCKED LOOP (ADDLL) 44 4.1.3 GLITCH-FREE DCDL CONTROL 47 4.1.4 DUTY-CYCLE CORRECTOR (DCC) 50 4.1.5 DQ/DQS TRANSMITTER 52 4.1.6 DQ/DQS RECEIVER 54 4.1.7 ZQ CALIBRATION 56 4.2 MODELING AND VERIFICATION OF LINK TRAINING 59 4.3 BUILT-IN SELF-TEST CIRCUITS 66 CHAPTER 5 QUADRATURE ERROR CORRECTOR USING REPLICA SERIALIZERS AND PULSE-SHRINKING DELAY LINES 69 5.1 PHASE CORRECTION USING REPLICA SERIALIZERS AND PULSE-SHRINKING UNITS 69 5.2 OVERALL QEC ARCHITECTURE AND ITS OPERATION 71 5.3 FINE DELAY UNIT IN THE PSDL 76 CHAPTER 6 EXPERIMENTAL RESULTS 78 6.1 CONTROLLER PHY 78 6.2 PROTOTYPE QEC 88 CHAPTER 7 CONCLUSION 94 BIBLIOGRAPHY 96Docto

    ์ ์‘ํ˜• ๋ˆˆ ๊ฐ์ง€ ๋ฐฉ๋ฒ•์„ ํฌํ•จํ•œ ์ €์ „๋ ฅ ๋ฉ”๋ชจ๋ฆฌ ์ปจํŠธ๋กค๋Ÿฌ์˜ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 8. ๊น€์ˆ˜ํ™˜.and the read margin was enhanced from 0.30UI and 76mV without AF-CTLE to 0.47UI and 80mV to with AF-CTLE. The power efficiency during burst write and read were 5.68pJ/bit and 1.83pJ/bit respectively.A 4266Mb/s/pin LPDDR4 memory controller with an asynchronous feedback continuous-time linear equalizer and an adaptive 3-step eye detection algorithm is presented. The asynchronous feedback continuous-time linear equalizer removes the glitch of DQS without training by applying an offset larger than the noise, and improves read margin by operating as a decision feedback equalizer in DQ path. The adaptive 3-step eye detection algorithm reduces power consumption and black-out time in initialization sequence and retraining in comparison to the 2-dimensional full scanning. In addition, the adaptive 3-step eye detection algorithm can maintain the accuracy by sequentially searching the eye boundaries and initializing the resolution using the binary search method when the eye detection result changes. To achieve high bandwidth, a transmitter and receiver suitable for training are proposed. The transmitter consists of a phase interpolator, a digitally-controlled delay line, a 16:1 serializer, a pre-driver and low-voltage swing terminated logic. The receiver consists of a reference voltage generator, a continuous-time linear equalizer, a phase interpolator, a digitally-controlled delay line, a 1:4 deserializer, and a 4:16 deserializer. The clocking architecture is also designed for low power consumption in idle periods, which are commonly lengthy in mobile applications. A prototype chip was implemented in a 65nm CMOS process with ball grid array package and tested with commodity LPDDR4. The write margin was 0.36UI and 148mVCHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 5 CHAPTER 2 LPDDR4 6 2.1 COMPARISON BETWEEN LPDDR3 AND LPDDR4 6 2.2 SOURCE SYNCHRONOUS CLOCKING SCHEME 9 2.3 SIGNALING STANDARDS 11 2.4 MULTIPLE TRAININGS 14 2.5 RE-TRAINING AND RE-INITIALIZATION 16 CHAPTER 3 ADAPTIVE EYE DETECTION 18 3.1 EYE DETECTION 18 3.2 1X2Y3X EYE DETECTION 20 3.3 ADAPTIVE GAIN CONTROL 22 3.4 ADAPTIVE 1X2Y3X EYE DETECTION 24 CHAPTER 4 LPDDR4 MEMORY CONTROLLER 26 4.1 DESIGN PROCEDURE 26 4.2 ARCHITECTURE 30 4.2.1 TRANSMITTER 33 4.2.2 RECEIVER 35 4.2.3 CLOCKING ARCHITECTURE 38 4.3 CIRCUIT IMPLEMENTATION 43 4.3.1 ADPLL WITH MULTI-MODULUS DIVIDER 43 4.3.2 ADDLL WITH TRIANGULAR-MODULATED PI 45 4.3.3 CTLE WITH AUTO-DQS CLEANING 47 4.3.4 DES WITH CLOCK DOMAIN CROSSING 52 4.3.5 LVSTL WITH ZQ CALIBRATION 54 4.3.6 COARSE-FINE DCDL 56 4.4 LINK TRAINING 57 4.4.1 SIMULATION RESULTS 59 CHAPTER 5 MEASUREMENT RESULTS 72 5.1 MEASUREMENT SETUP 72 5.2 MEASUREMENT RESULTS OF SUB-BLOCK 80 5.2.1 ADPLL WITH MULTI-MODULUS DIVIDER 80 5.2.2 ADDLL WITH TRIANGULAR-MODULATED PI 82 5.2.3 COARSE-FINE DCDL 84 5.3 LPDDR4 INTERFACE MEASUREMENT RESULTS 84 CHAPTER 6 CONCLUSION 88 BIBLIOGRAPHY 90Docto

    An Energy-Efficient Reconfigurable Mobile Memory Interface for Computing Systems

    Get PDF
    The critical need for higher power efficiency and bandwidth transceiver design has significantly increased as mobile devices, such as smart phones, laptops, tablets, and ultra-portable personal digital assistants continue to be constructed using heterogeneous intellectual properties such as central processing units (CPUs), graphics processing units (GPUs), digital signal processors, dynamic random-access memories (DRAMs), sensors, and graphics/image processing units and to have enhanced graphic computing and video processing capabilities. However, the current mobile interface technologies which support CPU to memory communication (e.g. baseband-only signaling) have critical limitations, particularly super-linear energy consumption, limited bandwidth, and non-reconfigurable data access. As a consequence, there is a critical need to improve both energy efficiency and bandwidth for future mobile devices.;The primary goal of this study is to design an energy-efficient reconfigurable mobile memory interface for mobile computing systems in order to dramatically enhance the circuit and system bandwidth and power efficiency. The proposed energy efficient mobile memory interface which utilizes an advanced base-band (BB) signaling and a RF-band signaling is capable of simultaneous bi-directional communication and reconfigurable data access. It also increases power efficiency and bandwidth between mobile CPUs and memory subsystems on a single-ended shared transmission line. Moreover, due to multiple data communication on a single-ended shared transmission line, the number of transmission lines between mobile CPU and memories is considerably reduced, resulting in significant technological innovations, (e.g. more compact devices and low cost packaging to mobile communication interface) and establishing the principles and feasibility of technologies for future mobile system applications. The operation and performance of the proposed transceiver are analyzed and its circuit implementation is discussed in details. A chip prototype of the transceiver was implemented in a 65nm CMOS process technology. In the measurement, the transceiver exhibits higher aggregate data throughput and better energy efficiency compared to prior works

    ๋ฉ”๋ชจ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐ ๋ ˆ๋ฒจ ๋‹จ์ผ ์ข…๋‹จ ์†ก์‹ ๊ธฐ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ๊น€์ˆ˜ํ™˜.๋ณธ ์—ฐ๊ตฌ์—์„œ ๋ฉ”๋ชจ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐ ๋ ˆ๋ฒจ ์†ก์‹ ๊ธฐ๊ฐ€ ์ œ์‹œ๋˜์—ˆ๋‹ค. ํ”„๋กœ์„ธ์„œ์™€ ๋ฉ”๋ชจ๋ฆฌ ๊ฐ„์˜ ์„ฑ๋Šฅ ์ฐจ์ด๊ฐ€ ๋งค๋…„ ๊ณ„์† ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ, ๋ฉ”๋ชจ๋ฆฌ๋Š” ์ „์ฒด ์‹œ์Šคํ…œ์˜ ๋ณ‘๋ชฉ์ ์ด ๋˜๊ณ ์žˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์„ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด PAM-4 ๋‹จ์ผ ์ข…๋‹จ ์†ก์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•˜์˜€๊ณ , ๋ฉ€ํ‹ฐ ๋žญํฌ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์œ„ํ•œ duobinary ๋‹จ์ผ ์ข…๋‹จ ์†ก์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ œ์•ˆ๋œ PAM-4 ์†ก์‹ ๊ธฐ์˜ ๋“œ๋ผ์ด๋ฒ„๋Š” ๋†’์€ ์„ ํ˜•์„ฑ๊ณผ ์ž„ํ”ผ๋˜์Šค ์ •ํ•ฉ์„ ๋™์‹œ์— ๋งŒ์กฑํ•œ๋‹ค. ๋˜ํ•œ ์ €ํ•ญ์ด๋‚˜ ์ธ๋•ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•„ ์ž‘์€ ๋ฉด์ ์„ ์ฐจ์ง€ํ•œ๋‹ค. ์ œ์•ˆ๋œ ZQ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜์€ ์„ธ๊ฐœ์˜ ๊ต์ • ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์–ด ์†ก์‹ ๊ธฐ๊ฐ€ ์ •ํ™•ํ•œ ์ž„ํ”ผ๋˜์Šค์™€ ์„ ํ˜•์ ์ธ ์ถœ๋ ฅ์„ ๊ฐ–๊ฒŒ ํ•œ๋‹ค. ํ”„๋กœํ†  ํƒ€์ž…์€ 65nm CMOS ๊ณต์ •์œผ๋กœ ์ œ์ž‘๋˜์—ˆ๊ณ  ์†ก์‹ ๊ธฐ๋Š” 0.0333mm2์˜ ๋ฉด์ ์„ ์ฐจ์ง€ํ•œ๋‹ค. ์ธก์ •๋œ 28Gb/s์—์„œ์˜ eye๋Š” 18.3ps์˜ ๊ธธ์ด์™€ 42.4mV์˜ ๋†’์ด๋ฅผ ๊ฐ–๊ณ , ์—๋„ˆ์ง€ ํšจ์œจ์€ 0.64pJ/bit์ด๋‹ค. ZQ ์บ˜๋ฆฌ๋ธŒ๋ ˆ์ด์…˜๊ณผ ํ•จ๊ป˜ ์ธก์ •๋œ RLM์€ 0.993์ด๋‹ค. ๋ฉ”๋ชจ๋ฆฌ์˜ ์šฉ๋Ÿ‰์„ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด ํ•˜๋‚˜์˜ ํŒจํ‚ค์ง€์— ์—ฌ๋Ÿฌ ๊ฐœ์˜ DRAM ๋‹ค์ด๋ฅผ ์ˆ˜์ง์œผ๋กœ ์Œ“๋Š” ํŒจํ‚ค์ง•์€ ๋ฉ”๋ชจ๋ฆฌ์˜ ์ค‘์•™ ํŒจ๋“œ ๊ตฌ์กฐ์™€ ๊ฒฐํ•ฉ๋˜์–ด ์งง์€ ๋ฐ˜์‚ฌ๋ฅผ ์•ผ๊ธฐํ•˜๋Š” ์Šคํ…์„ ๋งŒ๋“ ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ธฐ์œ„ํ•ด ๋ฐ˜์‚ฌ ๊ธฐ๋ฐ˜ duobinary ์†ก์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ–ˆ๋‹ค. ์ด ์†ก์‹ ๊ธฐ๋Š” ๋ฐ˜์‚ฌ๋ฅผ ์ด์šฉํ•˜์—ฌ duobinary signaling์„ ํ•œ๋‹ค. 2ํƒญ ๋ฐ˜๋Œ€ ๊ฐ•์กฐ ๊ธฐ์ˆ ๊ณผ ์Šฌ๋ฃจ ๋ ˆ์ดํŠธ ์กฐ์ ˆ ๊ธฐ์ˆ ์ด ์‹ ํ˜ธ ์™„๊ฒฐ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. NRZ eye๊ฐ€ ์—†๋Š” 10Gb/s์—์„œ ์ธก์ •๋œ duobinary eye๋Š” 63.6ps ๊ธธ์ด์™€ 70.8mV์˜ ๋†’์ด๋ฅผ ๊ฐ–๋Š”๋‹ค. ์ธก์ •๋œ ์—๋„ˆ์ง€ ํšจ์œจ์€ 1.38pJ/bit์ด๋‹ค.Multi-level transmitters for memory interfaces have been presented. The performance gap between processor and memory has been increased by 50% every year, making memory to be a bottle neck of the overall system. To increase memory bandwidth, we have proposed a PAM-4 single-ended transmitter. To compensate for the side effect of the multi-rank memory, we have proposed a reflection-based duobinary transmitter. The proposed PAM-4 transmitter has the driver, which simultaneously satisfies impedance matching and high linearity. The driver occupies a small area due to a resistorless and inductorless structure. The proposed ZQ calibration for PAM-4 has three calibration points, which allow the transmitter to have accurate impedance and linear output. The ZQ calibration considers impedance variation of both the driver and the receiver. A prototype has been fabricated in 65nm CMOS process, and the transmitter occupies 0.0333mm2. The measured eye has a width of 18.3ps and a height of 42.4mV at 28Gb/s, and the measured energy efficiency is 0.64pJ/b. The measured RLM with the 3-point ZQ calibration is 0.993. To increase memory density, the stacked die packaging with multiple DRAM die stacked vertically in one package is widely used. However, combined with the center-pad structure, the structure creates stubs that cause short reflections. We have proposed the reflection-based duobinary transmitter to mitigate this problem. The proposed transmitter uses reflection for duobinary signaling. The 2-tap opposite FFE and the slew-rate control are used to increase signal integrity. The measured duobinary eye at 10Gb/s has a width of 63.6ps and a height of 70.8mV while there is no NRZ eye opening. The measured energy efficiency is 1.38pJ/bit.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 8 CHAPTER 2 MUTI-LEVEL SIGNALING 9 2.1 PAM-4 SIGNALING 9 2.2 DESIGN CONSIDERATIONS FOR PAM-4 TRANSMITTER 16 2.2.1 LEVEL SEPARATION MISMATCH RATIO (RLM) 17 2.2.2 IMPEDANCE MATCHING 19 2.2.3 PRIOR ARTS 21 2.3 DUOBINARY SIGNALING 24 CHAPTER 3 HIGH-LINEARITY AND IMPEDANCE-MATCHED PAM-4 TRANSMITTER 30 3.1 OVERALL ARCHITECTURE 31 3.2 SINGLE-ENDED IMPEDANCE-MATCHED PAM-4 DRIVER 33 3.3 3-POINT ZQ CALIBRATION FOR PAM-4 47 CHAPTER 4 REFLECTION-BASED DUOBINARY TRANSMITTER 57 4.1 BIDIRECTIONAL DUAL-RANK MEMORY SYSTEM 58 4.2 CONCEPT OF REFLECTION-BASED DUOBINARY SIGNALING 66 4.3 REFLECTION-BASED DUOBINARY TRANSMITTER 70 4.3.1 OVERALL ARCHITECTURE 70 4.3.2 EQUALIZATION FOR REFLECTION-BASED DUOBINARY SIGNALING 72 4.3.3 2D BINARY-SEGMENTED DRIVER 75 CHAPTER 5 EXPERIMENTAL RESULTS 77 5.1 HIGH-LINEARITY AND IMPEDANCE-MATCHED PAM-4 TRANSMITTER 77 5.2 REFLECTION-BASED DUOBINARY TRANSMITTER 84 CHAPTER 6 92 CONCLUSION 92 BIBLIOGRAPHY 94Docto

    Hybrid NRZ/Multi-Tone Signaling for High-Speed Low-Power Wireline Transceivers

    Get PDF
    Over the past few decades, incessant growth of Internet networking traffic and High-Performance Computing (HPC) has led to a tremendous demand for data bandwidth. Digital communication technologies combined with advanced integrated circuit scaling trends have enabled the semiconductor and microelectronic industry to dramatically scale the bandwidth of high-loss interfaces such as Ethernet, backplane, and Digital Subscriber Line (DSL). The key to achieving higher bandwidth is to employ equalization technique to compensate the channel impairments such as Inter-Symbol Interference (ISI), crosstalk, and environmental noise. Therefore, todayรขs advanced input/outputs (I/Os) has been equipped with sophisticated equalization techniques to push beyond the uncompensated bandwidth of the system. To this end, process scaling has continually increased the data processing capability and improved the I/O performance over the last 15 years. However, since the channel bandwidth has not scaled with the same pace, the required signal processing and equalization circuitry becomes more and more complicated. Thereby, the energy efficiency improvements are largely offset by the energy needed to compensate channel impairments. In this design paradigm, re-thinking about the design strategies in order to not only satisfy the bandwidth performance, but also to improve power-performance becomes an important necessity. It is well known in communication theory that coding and signaling schemes have the potential to provide superior performance over band-limited channels. However, the choice of the optimum data communication algorithm should be considered by accounting for the circuit level power-performance trade-offs. In this thesis we have investigated the application of new algorithm and signaling schemes in wireline communications, especially for communication between microprocessors, memories, and peripherals. A new hybrid NRZ/Multi-Tone (NRZ/MT) signaling method has been developed during the course of this research. The system-level and circuit-level analysis, design, and implementation of the proposed signaling method has been performed in the frame of this work, and the silicon measurement results have proved the efficiency and the robustness of the proposed signaling methodology for wireline interfaces. In the first part of this work, a 7.5 Gb/s hybrid NRZ/MT transceiver (TRX) for multi-drop bus (MDB) memory interfaces is designed and fabricated in 40 nm CMOS technology. Reducing the complexity of the equalization circuitry on the receiver (RX) side, the proposed architecture achieves 1 pJ/bit link efficiency for a MDB channel bearing 45 dB loss at 2.5 GHz. The measurement results of the first prototype confirm that NRZ/MT serial data TRX can offer an energy-efficient solution for MDB memory interfaces. Motivated by the satisfying results of the first prototype, in the second phase of this research we have exploited the properties of multi-tone signaling, especially orthogonality among different sub-bands, to reduce the effect of crosstalk in high-dense wireline interconnects. A four-channel transceiver has been implemented in a standard CMOS 40 nm technology in order to demonstrate the performance of NRZ/MT signaling in presence of high channel loss and strong crosstalk noise. The proposed system achieves 1 pJ/bit power efficiency, while communicating over a MDB memory channel at 36 Gb/s aggregate data rate

    State of the art in chip-to-chip interconnects

    Get PDF
    This thesis presents a study of short-range links for chips mounted in the same package, on printed circuit boards or interposers. Implemented in CMOS technology between 7 and 250 nm, with links that operate at a data rate between 0,4 and 112 Gb/s/pin and with energy efficiencies from 0,3 to 67,7 pJ/bit. The links operate on channels with an attenuation lower than 50 dB. A comparison is made with graphical representations between the different articles that shows the correlation between the different essential metrics of chip-to-chip interconnects, as well as its evolution over the last 20 years.Esta tesis presenta un estudio de enlaces de corto alcance para chips montados en un mismo paquete, en placas de circuito impreso o intercaladores. Implementado en tecnologรญa CMOS entre 7 y 250 nm, con enlaces que operan a una velocidad de datos entre 0,4 y 112 Gb/s/pin y con eficiencias energรฉticas de 0,3 a 67,7 pJ/bit. Los enlaces operan en canales con una atenuaciรณn inferior a 50 dB. Se realiza una comparaciรณn con representaciones grรกficas entre los diferentes artรญculos que muestra la correlaciรณn entre las distintas mรฉtricas esenciales de las interconexiones chip a chip, asรญ como su evoluciรณn en los รบltimos 20 aรฑos.Aquesta tesi presenta un estudi d'enllaรงos de curt abast per a xips muntats en el mateix paquet, en plaques de circuits impresos o interposers. Implementat en tecnologia CMOS entre 7 i 250 nm, amb enllaรงos que funcionen a una velocitat de dades entre 0,4 i 112 Gb/s/pin i amb eficiรจncies energรจtiques de 0,3 a 67,7 pJ/bit. Els enllaรงos funcionen en canals amb una atenuaciรณ inferior a 50 dB. Es fa una comparaciรณ amb representacions grร fiques entre els diferents articles que mostra la correlaciรณ entre les diferents mรจtriques essencials d'interconnexions xip a xip, aixรญ com la seva evoluciรณ en els darrers 20 anys

    Design Techniques for Energy Efficient Multi-GB/S Serial I/O Transceivers

    Get PDF
    Total I/O bandwidth demand is growing in high-performance systems due to the emergence of many-core microprocessors and in mobile devices to support the next generation of multi-media features. High-speed serial I/O energy efficiency must improve in order to enable continued scaling of these parallel computing platforms in applications ranging from data centers to smart mobile devices. The first work, a low-power forwarded-clock I/O transceiver architecture is presented that employs a high degree of output/input multiplexing, supply-voltage scaling with data rate, and low-voltage circuit techniques to enable low-power operation. The transmitter utilizes a 4:1 output multiplexing voltage-mode driver along with 4-phase clocking that is efficiently generated from a passive poly-phase filter. The output driver voltage swing is accurately controlled from 100-200 mV_(ppd) using a low-voltage pseudo-differential regulator that employs a partial negative-resistance load for improved low frequency gain. 1:8 input de-multiplexing is performed at the receiver equalizer output with 8 parallel input samplers clocked from an 8-phase injection-locked oscillator that provides more than 1UI de-skew range. Low-power high-speed serial I/O transmitters which include equalization to compensate for channel frequency dependent loss are required to meet the aggressive link energy efficiency targets of future systems. The second work presents a low power serial link transmitter design that utilizes an output stage which combines a voltage-mode driver, which offers low static-power dissipation, and current-mode equalization, which offers low complexity and dynamic-power dissipation. The utilization of current-mode equalization decouples the equalization settings and termination impedance, allowing for a significant reduction in pre-driver complexity relative to segmented voltage-mode drivers. Proper transmitter series termination is set with an impedance control loop which adjusts the on-resistance of the output transistors in the driver voltage-mode portion. Further reductions in dynamic power dissipation are achieved through scaling the serializer and local clock distribution supply with data rate. Finally, it presents that a scalable quarter-rate transmitter employs an analog-controlled impedance-modulated 2-tap voltage-mode equalizer and achieves fast power-state transitioning with a replica-biased regulator and ILO clock generation. Capacitively-driven 2 mm global clock distribution and automatic phase calibration allows for aggressive supply scaling

    A LPDDR4 MEMORY CONTROLLER DESIGN WITH EYE CENTER DETECTION ALGORITHM

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2016. 2. ๊น€์ˆ˜ํ™˜.The demand for higher bandwidth with reduced power consumption in mobile memory is increasing. In this thesis, architecture of the LPDDR4 memory controller, operated with a LPDDR4 memory, is proposed and designed, and efficient training algorithm, which is appropriate for this architecture, is proposed for memory training and verification. The operation speed range of the LPDDR4 memory specification is from 533Mbps to 4266Mbps, and the LPDDR4 memory controller is designed to support that range of the LPDDR4 memory. The phase-locked loop in the LPDDR4 memory controller is designed to operate between 1333MHz and 2133MHz. To cover the range of the LPDDR4 memory, the selectable frequency divider is used to provide operation clock. The output frequency of the phase-locked loop with divider is from 266MHz to 2133MHz. The delay-locked loop in the LPDDR4 memory controller is designed to operate between 266MHz and 2133MHz with 180หš phase locking. The delay-locked loop is used each training operation, which is command training, data read and write training. To complete training in each training stage, eye center detection algorithm is used. The circuits for the proposed eye center detection algorithm such as delay line, phase interpolator and reference generator are designed and validated. The proposed 1x2y3x eye center detection algorithm is 23 times faster than conventional two-dimensional eye center detection algorithm and it can be implemented simply. Using 65nm CMOS process, the proposed LPDDR4 memory controller occupies 12mm2. The verification of the LPDDR4 memory controller is performed with commodity LPDDR4 memory. The verification of all training sequence, which is power on, initializing, boot up, command training, write leveling, read training, write training, is performed in this environment. The low voltage swing terminated logic driver and other several functions, including write leveling and data transmission, are verified at 4266Mbps and the entire LPDDR4 memory controller operations from 566Mbps to 1600Mbps are verified. The proposed eye center detection algorithm is verified from 566Mbps to 2843Mbps.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 INTRODUCTION 5 1.3 THESIS ORGANIZATION 7 CHAPTER 2 LPDDR4 MEMORY CONTROLLER DESIGN 8 2.1 DIFFERENCE BETWEEN LPDDR3 AND LPDDR4 MEMORY 8 2.1.1 ARCHITECTURAL DIFFERENCE BETWEEN LPDDR3 AND LPDDR4 MEMORY 10 2.1.2 SOURCE SYNCHRONOUS MATCHED SCHEME AND UNMATCHED SCHEME 11 2.1.3 LOW VOLTAGE SWING TERMINATED LOGIC DRIVER AND TERMINATION SCHEME 12 2.2 LPDDR4 MEMORY CONTROLLER SPECIFICATION 15 2.3 DESIGN PROCEDURE 18 CHAPTER 3 LPDDR4 MEMORY CONTROLLER ARCHITECTURE BASED ON MEMORY TRAINING 20 3.1 LPDDR4 MEMORY TRAINING SEQUENCE 20 3.2 LPDDR4 MEMORY TRAINING EYE DETECTION ALGORITHM 24 3.2.1 EYE CENTER DETECTION 24 3.2.2 1X2Y3X EYE CENTER DETECTION ALGORITHM 27 3.3. LPDDR4 MEMORY CONTROLLER DESIGN BASED ON MEMORY TRAINING 31 3.3.1 ARCHITECTURE FOR MEMORY BOOT UP AND POWER UP 31 3.3.2 CLOCK PATH ARCHITECTURE AND CLOCK TREE 34 3.3.3 COMMAND TRAINING AND COMMAND PATH ARCHITECTURE 35 3.3.4 WRITE LEVELING AND DATA STROBE TRANSMISSION PATH ARCHITECTURE 39 3.3.5 READ TRAINING AND READ PATH ARCHITECTURE 41 3.3.6 WRITE TRAINING AND WRITE PATH ARCHITECTURE 43 3.3.7 NORMAL READ/WRITE OPERATION AND MARGIN TEST 46 CHAPTER 4 LPDDR4 MEMORY CONTROLLER ARCHITECTURE MODELING AND CIRCUIT DESIGN 48 4.1 OVERALL LPDDR4 MEMORY CONTROLLER ARCHITECTURE MODELING 48 4.2 SIMULATION RESULT OF LPDDR4 MEMORY CONTROLLER MODELING 51 4.3 LPDDR4 MEMORY CONTROLLER CIRCUIT DESIGN 61 4.3.1 PHASE-LOCKED LOOP 61 4.3.2 DELAY-LOCKED LOOP 65 4.3.3 TRANSMITTER OF LPDDR4 MEMORY CONTROLLER: WRITE PATH 70 4.3.4 DE-SERIALIZER WITH CLOCK DOMAIN CROSSING 75 CHAPTER 5 MEASUREMENT RESULT OF LPDDR4 MEMORY CONTROLLER 77 5.1 LPDDR4 MEMORY CONTROLLER MEASUREMENT SETUP 77 5.1.1 LPDDR4 MEMORY CONTROLLER FLOOR PLAN AND LAYOUT 77 5.1.2 PACKAGE AND TEST BOARD 79 5.2 LPDDR4 MEMORY CONTROLLER SUB-BLOCK MEASUREMENT 81 5.2.1 PHASE-LOCKED LOOP 81 5.2.2 DELAY-LOCKED LOOP 83 5.2.3 200PS AND 800PS DELAY LINE 85 5.2.4 VOLTAGE REFERENCE GENERATOR 86 5.2.5 PHASE INTERPOLATOR 87 5.3 LPDDR4 MEMORY SYSTEM OPERATION MEASUREMENT 90 CHAPTER 6 CONCLUSION 93 APPENDIX OPERATION FLOW CHART OF THE PROPOSED LPDDR4 MEMORY CONTROLLER 95 BIBLIOGRAPHY 118 KOREAN ABSTRACT 124Docto

    PHY Link Design and Optimization For High-Speed Low-Power Communication Systems

    Get PDF
    The ever-growing demands for high-bandwidth data transfer have been pushing towards advancing research efforts in the field of high-performing communication systems. Studies on the performance of single chip, e.g. faster multi-core processors and higher system memory capacity, have been explored. To further enhance the system performance, researches have been focused on the improvement of data-transfer bandwidth for chip-to-chip communication in the high-speed serial link. Many solutions have been addressed to overcome the bottleneck caused by the non-idealties such as bandwidth-limited electrical channel that connects two link devices and varieties of undesired noise in the communication systems. Nevertheless, with these solutions data have run into limitations of the timing margins for high-speed interfaces running at multiple gigabits per second data rates on low-cost Printed Circuit Board (PCB) material with constrained power budget. Therefore, the challenge in designing a physical layer (PHY) link for high-speed communication systems turns out to be power-efficient, reliable and cost-effective. In this context, this dissertation is intended to focus on architectural design, system-level and circuit-level verification of a PHY link as well as system performance optimization in respective of power, reliability and adaptability in high-speed communication systems. The PHY is mainly composed of clock data recovery (CDR), equalizers (EQs) and high- speed I/O drivers. Symmetrical structure of the PHY link is usually duplicated in both link devices for bidirectional data transmission. By introducing training mechanisms into high-speed communication systems, the timing in one link device is adaptively aligned to the timing condition specified in the other link device despite of different skews or induced jitter resulting from process, voltage and temperature (PVT) variations in the individual link. With reliable timing relationships among the interface signals provided, the total system bandwidth is dramatically improved. On the other hand, interface training offers high flexibility for reuse without further investigation on high demanding components involved in high costs. In the training mode, a CDR module is essential for reconstructing the transmitted bitstream to achieve the best data eye and to detect the edges of data stream in asynchronous systems or source-synchronous systems. Generally, the CDR works as a feedback control system that aligns its output clock to the center of the received data. In systems that contain multiple data links, the overall CDR power consumption increases linearly with the increase in number of links as one CDR is required for each link. Therefore, a power-efficient CDR plays a significant role in such systems with parallel links. Furthermore, a high performance CDR requires low jitter generation in spite of high input jitter. To minimize the trade-off between power consumption and CDR jitter, a novel CDR architecture is proposed by utilizing the proportional-integral (PI) controller and three times sampling scheme. Meanwhile, signal integrity (SI) becomes critical as the data rate exceeds several gigabits per second. Distorted data due to the non-idealties in systems are likely to reduce the signal quality aggressively and result in intolerable transmission errors in worst case scenarios, thus affect the system effective bandwidth. Hence, additional trainings such as transmitter (Tx) and receiver (Rx) EQ trainings for SI purpose are inserted into the interface training. Besides, a simplified system architecture with unsymmetrical placement of adaptive Rx and Tx EQs in a single link device is proposed and analyzed by using different coefficient adaptation algorithms. This architecture enables to reduce a large number of EQs through the training, especially in case of parallel links. Meanwhile, considerable power and chip area are saved. Finally, high-speed I/O driver against PVT variations is discussed. Critical issues such as overshoot and undershoot interfering with the data are primarily accompanied by impedance mismatch between the I/O driver and its transmitting channel. By applying PVT compensation technique I/O driver impedances can be effectively calibrated close to the target value. Different digital impedance calibration algorithms against PVT variations are implemented and compared for achieving fast calibration and low power requirements
    corecore