136 research outputs found

    ์ตœ์ ์— ๊ฐ€๊นŒ์šด ํƒ€์ด๋ฐ ์ ์‘์„ ์œ„ํ•ด ์น˜์šฐ์นœ ๋ฐ์ดํ„ฐ ๋ ˆ๋ฒจ๊ณผ ๋ˆˆ ๊ฒฝ์‚ฌ ๋””ํ…ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ ์ตœ๋Œ€ ๋ˆˆํฌ๊ธฐ์ถ”์  ํด๋Ÿญ ๋ฐ ๋ฐ์ดํ„ฐ ๋ณต์›ํšŒ๋กœ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021. 2. ์ •๋•๊ท .์ด ๋…ผ๋ฌธ์—์„œ๋Š” ์ตœ์†Œ-๋น„ํŠธ ๋น„ํŠธ ์—๋Ÿฌ์œจ (BER)์— ๋Œ€ํ•œ ์ตœ๋Œ€ ๋ˆˆํฌ๊ธฐ ์ถ”์  CDR (MET-CDR)์˜ ์„ค๊ณ„๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ œ์•ˆ ๋œ CDR ์€ ์ตœ์ ์˜ ์ƒ˜ํ”Œ๋ง ๋‹จ๊ณ„๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ๋ฐ˜๋ณต ์ ˆ์ฐจ๋ฅผ ๊ฐ€์ง„ BER ์นด์šดํ„ฐ ๋˜๋Š” ์•„์ด ๋ชจ๋‹ˆํ„ฐ๊ฐ€ ํ•„ ์š”ํ•˜์ง€ ์•Š๋‹ค. ์—๋Ÿฌ ์ƒ˜ํ”Œ๋Ÿฌ ์ถœ๋ ฅ์— ๊ฐ€์ค‘์น˜๋ฅผ ๋‘์–ด ๋”ํ•˜์—ฌ ์–ป์€ ์น˜์šฐ์นœ ๋ฐ ์ดํ„ฐ ๋ ˆ๋ฒจ (biased dLev) ์€ ์‚ฌ์ „ ์ปค์„œ ISI(pre-cursor ISI) ์˜ ์ •๋ณด๋„ ๊ณ ๋ คํ•œ ๋ˆˆ ๋†’์ด ์ •๋ณด๋ฅผ ์ถ”์ถœํ•œ๋‹ค. ๋ธํƒ€ T ๋งŒํผ์˜ ์‹œ๊ฐ„ ์ฐจ์ด๋ฅผ ๋‘” ์ง€์ ์—์„œ ์ž‘๋™ ํ•˜๋Š” ๋‘ ์ƒ˜ํ”Œ๋Ÿฌ๋Š” ํ˜„์žฌ ๋ˆˆ ๋†’์ด์™€ ๋ˆˆ ๊ธฐ์šธ๊ธฐ์˜ ๊ทน์„ฑ์„ ๊ฐ์ง€ํ•˜๊ณ , ์ด ์ •๋ณด ๋ฅผ ํ†ตํ•ด ์ œ์•ˆํ•˜๋Š” CDR ์€ ๋ˆˆ ๊ธฐ์šธ๊ธฐ๊ฐ€ 0 ์ด๋˜๋Š” ์ตœ๋Œ€ ๋ˆˆ ๋†’์ด๋กœ ์ˆ˜๋ ดํ•œ ๋‹ค. ์ธก์ • ๊ฒฐ๊ณผ๋Š” ์ตœ๋Œ€ ๋ˆˆ ๋†’์ด์™€ ์ตœ์†Œ BER ์˜ ์ƒ˜ํ”Œ๋ง ์œ„์น˜๊ฐ€ ์ž˜ ์ผ์น˜ ํ•จ ์„ ๋ณด์—ฌ์ค€๋‹ค. 28nm CMOS ๊ณต์ •์œผ๋กœ ๊ตฌํ˜„๋œ ์ˆ˜์‹ ๊ธฐ ์นฉ์€ 23.5dB ์˜ ์ฑ„๋„ ์†์‹ค์ด ์žˆ๋Š” ์ƒํƒœ์—์„œ 26Gb/s ์—์„œ ๋™์ž‘ ๊ฐ€๋Šฅํ•˜๋‹ค. 0.25UI ์˜ ์•„์ด ์˜คํ”„๋‹ ์„ ๊ฐ€์ง€๋ฉฐ, 87mW ์˜ ํŒŒ์›Œ๋ฅผ ์†Œ๋น„ํ•œ๋‹ค.In this thesis, design of a maximum-eye-tracking CDR (MET-CDR) for minimum bit error rate (BER) is proposed. The proposed CDR does not require a BER coun-ter or an eye-opening monitor with any iterative procedure to find the near-optimal sampling phase. The biased data-level obtained from the weighted sum of error sampler outputs, UP and DN, extracts the actual eye height information in the presence of pre-cursor ISI. Two samplers operating on two slightly different tim-ings detect the current eye height and the polarity of the eye slope so that the CDR tracks the maximum eye height where the slope becomes zero. Measured results show that the sampling phase of the maximum eye height and that of the mini-mum BER match well. A prototype receiver fabricated in 28 nm CMOS process operates at 26 Gb/s with an eye-opening of 0.25 UI and consumes 87 mW while equalizing 23.5 dB of loss at 13 GHz.ABSTRACT I CONTENTS II LIST OF FIGURES IV LIST OF TABLES VIII CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 4 CHAPTER 2 BACKGROUNDS 5 2.1 RECEIVER FRONT-END 5 2.1.1 CHANNEL 7 2.1.2 EQUALIZER 17 2.1.3 CDR 32 2.2 PRIOR ARTS ON CLOCK RECOVERY 39 2.2.1 BB-CDR 39 2.2.2 BER-BASED CDR 41 2.2.3 EOM-BASED CDR 44 2.3 CONCEPT OF THE PROPOSED CDR 47 CHAPTER 3 MAXIMUM-EYE-TRACKING CDR WITH BIASED DATA-LEVEL AND EYE SLOPE DETECTOR 49 3.1 OVERVIEW 49 3.2 DESIGN OF MET-CDR 50 3.2.1 EYE HEIGHT INFORMATION FROM BIASED DATA-LEVEL 50 3.2.2 EYE SLOPE DETECTOR AND ADAPTATION ALGORITHM 60 3.2.3 ARCHITECTURE AND IMPLEMENTATION 67 3.2.4 VERIFICATION OF THE ALGORITHM 71 3.2.5 ANALYSIS ON THE BIASED DATA-LEVEL 76 3.3 EXPANSION OF MET-CDR TO PAM4 SIGNALING 84 3.3.1 MET-CDR WITH PAM4 84 3.3.2 CONSIDERATIONS FOR PAM4 87 CHAPTER 4 MEASUREMENT RESULTS 89 CHAPTER 5 CONCLUSION 99 APPENDIX A MATLAB CODE FOR SIMULATING RECEIVER WITH MET-CDR 100 BIBLIOGRAPHY 105 ์ดˆ ๋ก 113Docto

    ์ฐจ์„ธ๋Œ€ ์ž๋™์ฐจ์šฉ ์นด๋ฉ”๋ผ ๋ฐ์ดํ„ฐ ํ†ต์‹ ์„ ์œ„ํ•œ ๋น„๋Œ€์นญ ๋™์‹œ ์–‘๋ฐฉํ–ฅ ์†ก์ˆ˜์‹ ๊ธฐ์˜ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022.2. ์ •๋•๊ท .๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์ฐจ์„ธ๋Œ€ ์ž๋™์ฐจ์šฉ ์นด๋ฉ”๋ผ ๋งํฌ๋ฅผ ์œ„ํ•ด ๋†’์€ ์†๋„์˜ 4๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์‹ ํ˜ธ์™€ ๋‚ฎ์€ ์†๋„์˜ 2๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์‹ ํ˜ธ๋ฅผ ํ†ต์‹ ํ•˜๋Š” ๋น„๋Œ€์นญ ๋™์‹œ ์–‘๋ฐฉํ–ฅ ์†ก์ˆ˜์‹ ๊ธฐ์˜ ์„ค๊ณ„ ๊ธฐ์ˆ ์— ๋Œ€ํ•ด ์ œ์•ˆํ•˜๊ณ  ๊ฒ€์ฆ๋˜์—ˆ๋‹ค. ์ฒซ๋ฒˆ์งธ ํ”„๋กœํ† ํƒ€์ž… ์„ค๊ณ„์—์„œ๋Š”, 10B6Q ์ง๋ฅ˜ ๋ฐธ๋Ÿฐ์Šค ์ฝ”๋“œ๋ฅผ ํƒ‘์žฌํ•œ 4๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์†ก์‹ ๊ธฐ์™€ ๊ณ ์ •๋œ ๋ฐ์ดํ„ฐ์™€ ์ฐธ์กฐ ๋ ˆ๋ฒจ์„ ๊ฐ€์ง€๋Š” 4๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์ ์‘ํ˜• ์ˆ˜์‹ ๊ธฐ์— ๋Œ€ํ•œ ๋‚ด์šฉ์ด ๊ธฐ์ˆ ๋˜์—ˆ๋‹ค. 4๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์†ก์‹ ๊ธฐ์—์„œ๋Š” ๊ต๋ฅ˜ ์—ฐ๊ฒฐ ๋งํฌ ์‹œ์Šคํ…œ์— ๋Œ€์‘ํ•˜๊ธฐ ์œ„ํ•œ ๋ฉด์  ๋ฐ ์ „๋ ฅ ํšจ์œจ์„ฑ์ด ์ข‹์€ 10B6Q ์ฝ”๋“œ๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ด ์ฝ”๋“œ๋Š” ์ง๋ฅ˜ ๋ฐธ๋Ÿฐ์Šค๋ฅผ ๋งž์ถ”๊ณ  ์—ฐ์†์ ์œผ๋กœ ๊ฐ™์€ ์‹ฌ๋ณผ์„ ๊ฐ€์ง€๋Š” ๊ธธ์ด๋ฅผ 6๊ฐœ๋กœ ์ œํ•œ ์‹œํ‚จ๋‹ค. ๋น„๋ก ์—ฌ๊ธฐ์„œ๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ๊ธธ์ด 10๋น„ํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์˜€์ง€๋งŒ, ์ œ์•ˆ๋œ ๊ธฐ์ˆ ์€ ์นด๋ฉ”๋ผ์˜ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ํƒ€์ž…์— ๋Œ€์‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ๊ธธ์ด์— ๋Œ€ํ•œ ํ™•์žฅ์„ฑ์„ ๊ฐ€์ง„๋‹ค. ๋ฐ˜๋ฉด, 4๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์ ์‘ํ˜• ์ˆ˜์‹ ๊ธฐ์—์„œ๋Š”, ์ƒ˜ํ”Œ๋Ÿฌ์˜ ์˜ต์…‹์„ ์ตœ์ ์œผ๋กœ ์ œ๊ฑฐํ•˜์—ฌ ๋” ๋‚ฎ์€ ๋น„ํŠธ์—๋Ÿฌ์œจ์„ ์–ป๊ธฐ ์œ„ํ•ด์„œ, ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ ๋ฐ ์ฐธ์กฐ ๋ ˆ๋ฒจ์„ ์กฐ์ ˆํ•˜๋Š” ๋Œ€์‹ , ์ด ๋ ˆ๋ฒจ๋“ค์€ ๊ณ ์ •์‹œํ‚ค๊ณ  ๊ฐ€๋ณ€ ๊ฒŒ์ธ ์ฆํญ๊ธฐ๋ฅผ ์ ์‘ํ˜•์œผ๋กœ ์กฐ์ ˆํ•˜๋„๋ก ํ•˜์˜€๋‹ค. ์ƒ๊ธฐ 10B6Q ์ฝ”๋“œ ๋ฐ ๊ณ ์ • ๋ฐ์ดํ„ฐ ๋ฐ ์ฐธ์กฐ๋ ˆ๋ฒจ ๊ธฐ์ˆ ์„ ๊ฐ€์ง„ ํ”„๋กœํ† ํƒ€์ž… ์นฉ๋“ค์€ 40 ๋‚˜๋…ธ๋ฏธํ„ฐ ์ƒํ˜ธ๋ณด์™„ํ˜• ๋ฉ”ํƒˆ ์‚ฐํ™” ๋ฐ˜๋„์ฒด ๊ณต์ •์œผ๋กœ ์ œ์ž‘๋˜์—ˆ๊ณ  ์นฉ ์˜จ ๋ณด๋“œ ํ˜•ํƒœ๋กœ ํ‰๊ฐ€๋˜์—ˆ๋‹ค. 10B6Q ์ฝ”๋“œ๋Š” ํ•ฉ์„ฑ ๊ฒŒ์ดํŠธ ์ˆซ์ž๋Š” 645๊ฐœ์™€ ํ•จ๊ป˜ ๋‹จ 0.0009 mm2 ์˜ ๋ฉด์  ๋งŒ์„ ์ฐจ์ง€ํ•œ๋‹ค. ๋˜ํ•œ, 667 MHz ๋™์ž‘ ์ฃผํŒŒ์ˆ˜์—์„œ ๋‹จ 0.23 mW ์˜ ์ „๋ ฅ์„ ์†Œ๋ชจํ•œ๋‹ค. 10B6Q ์ฝ”๋“œ๋ฅผ ํƒ‘์žฌํ•œ ์†ก์‹ ๊ธฐ์—์„œ 8-Gb/s 4๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์‹ ํ˜ธ๋ฅผ ๊ณ ์ • ๋ฐ์ดํ„ฐ ๋ฐ ์ฐธ์กฐ ๋ ˆ๋ฒจ์„ ๊ฐ€์ง€๋Š” ์ ์‘ํ˜• ์ˆ˜์‹ ๊ธฐ๋กœ 12-m ์ผ€์ด๋ธ” (22-dB ์ฑ„๋„ ๋กœ์Šค) ์„ ํ†ตํ•ด์„œ ๋ณด๋‚ธ ๊ฒฐ๊ณผ ์ตœ์†Œ ๋น„ํŠธ ์—๋Ÿฌ์œจ 108 ์„ ๋‹ฌ์„ฑํ•˜์˜€๊ณ , ๋น„ํŠธ ์—๋Ÿฌ์œจ 105 ์—์„œ๋Š” ์•„์ด ๋งˆ์ง„์ด 0.15 UI x 50 mV ๋ณด๋‹ค ํฌ๊ฒŒ ์ธก์ •๋˜์—ˆ๋‹ค. ์†ก์ˆ˜์‹ ๊ธฐ๋ฅผ ํ•ฉ์นœ ์ „๋ ฅ ์†Œ๋ชจ๋Š” 65.2 mW (PLL ์ œ์™ธ) ์ด๊ณ , ์„ฑ๊ณผ์˜ ๋Œ€ํ‘œ์ˆ˜์น˜๋Š” 0.37 pJ/b/dB ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ์ฒซ๋ฒˆ์งธ ํ”„๋กœํ† ํƒ€์ž… ์„ค๊ณ„์„ ํฌํ•จํ•˜์—ฌ ๊ฐœ์„ ๋œ ๋‘๋ฒˆ์งธ ํ”„๋กœํ† ํƒ€์ž… ์„ค๊ณ„์—์„œ๋Š”, 12-Gb/s 4๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์ •๋ฐฉํ–ฅ ์ฑ„๋„ ์‹ ํ˜ธ์™€ 125-Mb/s 2๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์—ญ๋ฐฉํ–ฅ ์ฑ„๋„ ์‹ ํ˜ธ๋ฅผ ํƒ‘์žฌํ•œ ๋น„๋Œ€์นญ ๋™์‹œ ์–‘๋ฐฉํ–ฅ ์†ก์ˆ˜์‹ ๊ธฐ์— ๋Œ€ํ•ด ๊ธฐ์ˆ ๋˜๊ณ  ๊ฒ€์ฆ๋˜์—ˆ๋‹ค. ์ œ์•ˆ๋œ ๋„“์€ ์„ ํ˜• ๋ฒ”์œ„๋ฅผ ๊ฐ€์ง€๋Š” ํ•˜์ด๋ธŒ๋ฆฌ๋“œ๋Š” gmC ์ €๋Œ€์—ญ ํ†ต๊ณผ ํ•„ํ„ฐ์™€ ์—์ฝ” ์ œ๊ฑฐ๊ธฐ์™€ ํ•จ๊ป˜ ์•„์›ƒ๋ฐ”์šด๋“œ ์‹ ํ˜ธ๋ฅผ 24 dB ์ด์ƒ ํšจ์œจ์ ์œผ๋กœ ๊ฐ์†Œ์‹œ์ผฐ๋‹ค. ๋˜ํ•œ, ๋„“์€ ์„ ํ˜• ๋ฒ”์œ„๋ฅผ ๊ฐ€์ง€๋Š” ํ•˜์ด๋ธŒ๋ฆฌ๋“œ์™€ ํ•จ๊ป˜ ๊ฒŒ์ธ ๊ฐ์†Œ๊ธฐ๋ฅผ ํ˜•์„ฑํ•˜๊ฒŒ ๋˜๋Š” ์„ ํ˜• ๋ฒ”์œ„ ์ฆํญ๊ธฐ๋ฅผ ํ†ตํ•ด 4๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์‹ ํ˜ธ์˜ ์„ ํ˜•์„ฑ๊ณผ ์ง„ํญ์˜ ํŠธ๋ ˆ์ด๋“œ ์˜คํ”„ ๊ด€๊ณ„๋ฅผ ๊นจ๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜์˜€๋‹ค. ๋™์‹œ ์–‘๋ฐฉํ–ฅ ์†ก์ˆ˜์‹ ๊ธฐ ์นฉ์€ 40 ๋‚˜๋…ธ๋ฏธํ„ฐ ์ƒํ˜ธ๋ณด์™„ํ˜• ๋ฉ”ํƒˆ ์‚ฐํ™” ๋ฐ˜๋„์ฒด ๊ณต์ •์œผ๋กœ ์ œ์ž‘๋˜์—ˆ๋‹ค. ์ƒ๊ธฐ ์„ค๊ณ„ ๊ธฐ์ˆ ๋“ค์„ ์ด์šฉํ•˜์—ฌ, 4๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ๋ฐ 2๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์†ก์ˆ˜์‹ ๊ธฐ ๋ชจ๋‘ 5m ์ฑ„๋„ (์ฑ„๋„ ๋กœ์Šค 15.9 dB) ์—์„œ 1E-12 ๋ณด๋‹ค ๋‚ฎ์€ ๋น„ํŠธ ์—๋Ÿฌ์œจ์„ ๋‹ฌ์„ฑํ•˜์˜€๊ณ , ์ด 78.4 mW ์˜ ์ „๋ ฅ ์†Œ๋ชจ๋ฅผ ๊ธฐ๋กํ•˜์˜€๋‹ค. ์ข…ํ•ฉ์ ์ธ ์†ก์ˆ˜์‹ ๊ธฐ๋Š” ์„ฑ๊ณผ ๋Œ€ํ‘œ์ง€ํ‘œ๋กœ 0.41 pJ/b/dB ์™€ ํ•จ๊ป˜ ๋™์‹œ ์–‘๋ฐฉํ–ฅ ํ†ต์‹  ์•„๋ž˜์—์„œ 4๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์‹ ํ˜ธ ๋ฐ 2๋ ˆ๋ฒจ ํŽ„์Šค ์ง„ํญ ๋ณ€์กฐ ์‹ ํ˜ธ ๊ฐ๊ฐ์—์„œ ์•„์ด ๋งˆ์ง„ 0.15 UI ์™€ 0.57 UI ๋ฅผ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ์ด ์ˆ˜์น˜๋Š” ์„ฑ๊ณผ ๋Œ€ํ‘œ์ง€ํ‘œ 0.5 ์ดํ•˜๋ฅผ ๊ฐ€์ง€๋Š” ๊ธฐ์กด ๋™์‹œ ์–‘๋ฐฉํ–ฅ ์†ก์ˆ˜์‹ ๊ธฐ์™€์˜ ๋น„๊ต์—์„œ ์ตœ๊ณ ์˜ ์•„์ด ๋งˆ์ง„์„ ๊ธฐ๋กํ•˜์˜€๋‹ค.In this dissertation, design techniques of a highly asymmetric simultaneous bidirectional (SB) transceivers with high-speed PAM-4 and low-speed PAM-2 signals are proposed and demonstrated for the next-generation automotive camera link. In a first prototype design, a PAM-4 transmitter with 10B6Q DC balance code and a PAM-4 adaptive receiver with fixed data and threshold levels (dtLevs) are presented. In PAM-4 transmitter, an area- and power-efficient 10B6Q code for an AC coupled link system that guarantees DC balance and limited run length of six is proposed. Although the input data width of 10 bits is used here, the proposed scheme has an extensibility for the input data width to cover various data types of the camera. On the other hand, in the PAM-4 adaptive receiver, to optimally cancel the sampler offset for a lower BER, instead of adjusting dtLevs, the gain of a programmable gain amplifier is adjusted adaptively under fixed dtLevs. The prototype chips including above proposed 10B6Q code and fixed dtLevs are fabricated in 40-nm CMOS technology and tested in chip-on-board assembly. The 10B6Q code only occupies an active area of 0.0009 mm2 with a synthesized gate count of 645. It also consumes 0.23 mW at the operating clock frequency of 667 MHz. The transmitter with 10B6Q code delivers 8-Gb/s PAM-4 signal to the adaptive receiver using fixed dtLevs through a lossy 12-m cable (22-dB channel loss) with a BER of 1E-8, and the eye margin larger than 0.15 UI x 50 mV is measured for a BER of 1E-5. The proto-type chips consume 65.2 mW (excluding PLL), exhibiting an FoM of 0.37 pJ/b/dB. In a second prototype design advanced from the first prototypes, An asymmetric SB transceivers incorporating a 12-Gb/s PAM-4 forward channel and a 125-Mb/s PAM-2 back channel are presented and demonstrated. The proposed wide linear range (WLR) hybrid combined with a gmC low-pass filter and an echo canceller effectively suppresses the outbound signals by more than 24dB. In addition, linear range enhancer which forms a gain attenuator with WLR hybrid breaks the trade-off between the linearity and the amplitude of the PAM-4 signal. The SB transceiver chips are separately fabricated in 40-nm CMOS technology. Using above design techniques, both PAM-4 and PAM-2 SB transceivers achieve BER less than 1E-12 over a 5-m channel (15.9 dB channel loss), consuming 78.4 mW. The overall transceivers achieve an FoM of 0.41 pJ/b/dB and eye margin (at BER of 1E-12) of 0.15 UI and 0.57 UI for the forward PAM-4 and back PAM-2 signals, respectively, under SB communication. This is the best eye margin compared to the prior art SB transceivers with an FoM less than 0.5.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 DISSERTATION ORGANIZATION 4 CHAPTER 2 BACKGROUND ON AUTOMOTIVE CAMERA LINK 6 2.1 OVERVIEW 6 2.2 SYSTEM REQUIREMENTS 10 2.2.1 CHANNEL 10 2.2.2 POWER OVER DIFFERENTIAL LINE (PODL) 12 2.2.3 AC COUPLING AND DC BALANCE CODE 15 2.2.4 SIMULTANEOUS BIDIRECTIONAL COMMUNICATION 18 2.2.4.1 HYBRID 18 2.2.4.2 ECHO CANCELLER 20 2.2.5 ADAPTIVE RECEIVE EQUALIZATION 22 CHAPTER 3 AREA AND POWER EFFICIENT 10B6Q ENCODER FOR DC BALANCE 25 3.1 INTRODUCTION 25 3.2 PRIOR WORKS 28 3.3 PROPOSED AREA- AND POWER-EFFICIENT 10B6Q PAM-4 CODER 30 3.4 DESIGN OF THE 10B6Q CODE 33 3.4.1 PAM-4 DC BALANCE 35 3.4.2 PAM-4 TRANSITION DENSITY 35 3.4.3 10B6Q DECODER 37 3.5 IMPLEMENTATION AND MEASUREMENT RESULTS 40 CHAPTER 4 PAM-4 TRANSMITTER AND ADAPTIVE RECEIVER WITH FIXED DATA AND THRESHOLD LEVELS 45 4.1 INTRODUCTION 45 4.2 PRIOR WORKS 47 4.3 ARCHITECTURE AND IMPLEMENTATION 49 4.2.1 PAM-4 TRANSMITTER 49 4.2.2 PAM-4 ADAPTIVE RECEIVER 52 4.3 MEASUREMENT RESULTS 62 CHAPTER 5 ASYMMETRIC SIMULTANEOUS BIDIRECTIONAL TRANSCEIVERS USING WIDE LINEAR RANGE HYBRID 68 5.1 INTRODUCTION 68 5.2 PRIOR WORKS 70 5.3 WIDE LINEAR RANGE (WLR) HYBRID 75 5.3 IMPLEMENTATION 78 5.3.1 SERIALIZER (SER) DESIGN 78 5.3.2 DESERIALIZER (DES) DESIGN 79 5.4 HALF CIRCUIT ANALYSIS OF WLR HYBRID AND LRE 82 5.5 MEASUREMENT RESULTS 88 CHAPTER 6 CONCLUSION 97 BIBLIOGRAPHY 99 ์ดˆ ๋ก 106๋ฐ•

    Energy-Efficient Receiver Design for High-Speed Interconnects

    Get PDF
    High-speed interconnects are of vital importance to the operation of high-performance computing and communication systems, determining the ultimate bandwidth or data rates at which the information can be exchanged. Optical interconnects and the employment of high-order modulation formats are considered as the solutions to fulfilling the envisioned speed and power efficiency of future interconnects. One common key factor in bringing the success is the availability of energy-efficient receivers with superior sensitivity. To enhance the receiver sensitivity, improvement in the signal-to-noise ratio (SNR) of the front-end circuits, or equalization that mitigates the detrimental inter-symbol interference (ISI) is required. In this dissertation, architectural and circuit-level energy-efficient techniques serving these goals are presented. First, an avalanche photodetector (APD)-based optical receiver is described, which utilizes non-return-to-zero (NRZ) modulation and is applicable to burst-mode operation. For the purposes of improving the overall optical link energy efficiency as well as the link bandwidth, this optical receiver is designed to achieve high sensitivity and high reconfiguration speed. The high sensitivity is enabled by optimizing the SNR at the front-end through adjusting the APD responsivity via its reverse bias voltage, along with the incorporation of 2-tap feedforward equalization (FFE) and 2-tap decision feedback equalization (DFE) implemented in current-integrating fashion. The high reconfiguration speed is empowered by the proposed integrating dc and amplitude comparators, which eliminate the RC settling time constraints. The receiver circuits, excluding the APD die, are fabricated in 28-nm CMOS technology. The optical receiver achieves bit-error-rate (BER) better than 1Eโˆ’12 at โˆ’16-dBm optical modulation amplitude (OMA), 2.24-ns reconfiguration time with 5-dB dynamic range, and 1.37-pJ/b energy efficiency at 25 Gb/s. Second, a 4-level pulse amplitude modulation (PAM4) wireline receiver is described, which incorporates continuous time linear equalizers (CTLEs) and a 2-tap direct DFE dedicated to the compensation for the first and second post-cursor ISI. The direct DFE in a PAM4 receiver (PAM4-DFE) is made possible by the proposed CMOS track-and-regenerate slicer. This proposed slicer offers rail-to-rail digital feedback signals with significantly improved clock-to-Q delay performance. The reduced slicer delay relaxes the settling time constraint of the summer circuits and allows the stringent DFE timing constraint to be satisfied. With the availability of a direct DFE employing the proposed slicer, inductor-based bandwidth enhancement and loop-unrolling techniques, which can be power/area intensive, are not required. Fabricated in 28-nm CMOS technology, the PAM4 receiver achieves BER better than 1Eโˆ’12 and 1.1-pJ/b energy efficiency at 60 Gb/s, measured over a channel with 8.2-dB loss at Nyquist frequency. Third, digital neural-network-enhanced FFEs (NN-FFEs) for PAM4 analog-to-digital converter (ADC)-based optical interconnects are described. The proposed NN-FFEs employ a custom learnable piecewise linear (PWL) activation function to tackle the nonlinearities with short memory lengths. In contrast to the conventional Volterra equalizers where multipliers are utilized to generate the nonlinear terms, the proposed NN-FFEs leverage the custom PWL activation function for nonlinear operations and reduce the required number of multipliers, thereby improving the area and power efficiencies. Applications in the optical interconnects based on micro-ring modulators (MRMs) are demonstrated with simulation results of 50-Gb/s and 100-Gb/s links adopting PAM4 signaling. The proposed NN-FFEs and the conventional Volterra equalizers are synthesized with the standard-cell libraries in a commercial 28-nm CMOS technology, and their power consumptions and performance are compared. Better than 37% lower power overhead can be achieved by employing the proposed NN-FFEs, in comparison with the Volterra equalizer that leads to similar improvement in the symbol-error-rate (SER) performance.</p

    High Speed Reconfigurable NRZ/PAM4 Transceiver Design Techniques

    Get PDF
    While the majority of wireline standards use simple binary non-return-to-zero (NRZ) signaling, four-level pulse-amplitude modulation (PAM4) standards are emerging to increase bandwidth density. This dissertation proposes efficient implementations for high speed NRZ/PAM4 transceivers. The first prototype includes a dual-mode NRZ/PAM4 serial I/O transmitter which can support both modulations with minimum power and hardware overhead. A source-series-terminated (SST) transmitter achieves 1.2Vpp output swing and employs lookup table (LUT) control of a 31-segment output digital-to-analog converter (DAC) to implement 4/2-tap feed-forward equalization (FFE) in NRZ/PAM4 modes, respectively. Transmitter power is improved with low-overhead analog impedance control in the DAC cells and a quarter-rate serializer based on a tri-state inverter-based mux with dynamic pre-driver gates. The transmitter is designed to work with a receiver that implements an NRZ/PAM4 decision feedback equalizer (DFE) that employs 1 finite impulse response (FIR) and 2 infinite impulse response (IIR) taps for first post-cursor and long-tail ISI cancellation, respectively. Fabricated in GP 65-nm CMOS, the transmitter occupies 0.060mmยฒ area and achieves 16Gb/s NRZ and 32Gb/s PAM4 operation at 10.4 and 4.9 mW/Gb/s while operating over channels with 27.6 and 13.5dB loss at Nyquist, respectively. The second prototype presents a 56Gb/s four-level pulse amplitude modulation (PAM4) quarter-rate wireline receiver which is implemented in a 65nm CMOS process. The frontend utilize a single stage continuous time linear equalizer (CTLE) to boost the main cursor and relax the pre-cursor cancelation requirement, requiring only a 2-tap pre-cursor feed-forward equalization (FFE) on the transmitter side. A 2-tap decision feedback equalizer (DFE) with one finite impulse response (FIR) tap and one infinite impulse response (IIR) tap is employed to cancel first post-cursor and longtail inter-symbol interference (ISI). The FIR tap direct feedback is implemented inside the CML slicers to relax the critical timing of DFE and maximize the achievable data-rate. In addition to the per-slice main 3 data samplers, an error sampler is utilized for background threshold control and an edge-based sampler performs both PLL-based CDR phase detection and generates information for background DFE tap adaptation. The receiver consumes 4.63mW/Gb/s and compensates for up to 20.8dB loss when operated with a 2- tap FFE transmitter. The experimental results and comparison with state-of-the-art shows superior power efficiency of the presented prototypes for similar data-rate and channel loss. The usage of proposed design techniques are not limited to these specific prototypes and can be applied for any wireline transceiver with different modulation, data-rate and CMOS technology

    Design of energy efficient high speed I/O interfaces

    Get PDF
    Energy efficiency has become a key performance metric for wireline high speed I/O interfaces. Consequently, design of low power I/O interfaces has garnered large interest that has mostly been focused on active power reduction techniques at peak data rate. In practice, most systems exhibit a wide range of data transfer patterns. As a result, low energy per bit operation at peak data rate does not necessarily translate to overall low energy operation. Therefore, I/O interfaces that can scale their power consumption with data rate requirement are desirable. Rapid on-off I/O interfaces have a potential to scale power with data rate requirements without severely affecting either latency or the throughput of the I/O interface. In this work, we explore circuit techniques for designing rapid on-off high speed wireline I/O interfaces and digital fractional-N PLLs. A burst-mode transmitter suitable for rapid on-off I/O interfaces is presented that achieves 6 ns turn-on time by utilizing a fast frequency settling ring oscillator in digital multiplying delay-locked loop and a rapid on-off biasing scheme for current mode output driver. Fabricated in 90 nm CMOS process, the prototype achieves 2.29 mW/Gb/s energy efficiency at peak data rate of 8 Gb/s. A 125X (8 Gb/s to 64 Mb/s) change in effective data rate results in 67X (18.29 mW to 0.27 mW) change in transmitter power consumption corresponding to only 2X (2.29 mW/Gb/s to 4.24 mW/Gb/s) degradation in energy efficiency for 32-byte long data bursts. We also present an analytical bit error rate (BER) computation technique for this transmitter under rapid on-off operation, which uses MDLL settling measurement data in conjunction with always-on transmitter measurements. This technique indicates that the BER bathtub width for 10^(โˆ’12) BER is 0.65 UI and 0.72 UI during rapid on-off operation and always-on operation, respectively. Next, a pulse response estimation-based technique is proposed enabling burst-mode operation for baud-rate sampling receivers that operate over high loss channels. Such receivers typically employ discrete time equalization to combat inter-symbol interference. Implementation details are provided for a receiver chip, fabricated in 65nm CMOS technology, that demonstrates efficacy of the proposed technique. A low complexity pulse response estimation technique is also presented for low power receivers that do not employ discrete time equalizers. We also present techniques for implementation of highly digital fractional-N PLL employing a phase interpolator based fractional divider to improve the quantization noise shaping properties of a 1-bit โˆ†ฮฃ frequency-to-digital converter. Fabricated in 65nm CMOS process, the prototype calibration-free fractional-N Type-II PLL employs the proposed frequency-to-digital converter in place of a high resolution time-to-digital converter and achieves 848 fs rms integrated jitter (1 kHz-30 MHz) and -101 dBc/Hz in-band phase noise while generating 5.054 GHz output from 31.25 MHz input

    ๋ฐ์ดํ„ฐ ์ „์†ก๋กœ ํ™•์žฅ์„ฑ๊ณผ ๋ฃจํ”„ ์„ ํ˜•์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚จ ๋‹ค์ค‘์ฑ„๋„ ์ˆ˜์‹ ๊ธฐ๋“ค์— ๊ด€ํ•œ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2013. 2. ์ •๋•๊ท .Two types of serial data communication receivers that adopt a multichannel architecture for a high aggregate I/O bandwidth are presented. Two techniques for collaboration and sharing among channels are proposed to enhance the loop-linearity and channel-expandability of multichannel receivers, respectively. The first proposed receiver employs a collaborative timing scheme recovery which relies on the sharing of all outputs of phase detectors (PDs) among channels to extract common information about the timing and multilevel signaling architecture of PAM-4. The shared timing information is processed by a common global loop filter and is used to update the phase of the voltage-controlled oscillator with better rejection of per-channel noise. In addition to collaborative timing recovery, a simple linearization technique for binary PDs is proposed. The technique realizes a high-rate oversampling PD while the hardware cost is equivalent to that of a conventional 2x-oversampling clock and data recovery. The first receiver exploiting the collaborative timing recovery architecture is designed using 45-nm CMOS technology. A single data lane occupies a 0.195-mm2 area and consumes a relatively low 17.9 mW at 6 Gb/s at 1.0V. Therefore, the power efficiency is 2.98 mW/Gb/s. The simulated jitter is about 0.034 UI RMS given an input jitter value of 0.03 UI RMS, while the relatively constant loop bandwidth with the PD linearization technique is about 7.3-MHz regardless of the data-stream noise. Unlike the first receiver, the second proposed multichannel receiver was designed to reduce the hardware complexity of each lane. The receiver employs shared calibration logic among channels and yet achieves superior channel expandability with slim data lanes. A shared global calibration control, which is used in a forwarded clock receiver based on a multiphase delay-locked loop, accomplishes skew calibration, equalizer adaptation, and the phase lock of all channels during a calibration period, resulting in reduced hardware overhead and less area required by each data lane. The second forwarded clock receiver is designed in 90-nm CMOS technology. It achieves error-free eye openings of more than 0.5 UI across 9โˆ’ 28 inch Nelco 4000-6 microstrips at 4โˆ’ 7 Gb/s and more than 0.42 UI at data rates of up to 9 Gb/s. The data lane occupies only 0.152 mm2 and consumes 69.8 mW, while the rest of the receiver occupies 0.297 mm2 and consumes 56 mW at a data rate of 7 Gb/s and a supply voltage of 1.35 V.1. Introduction 1 1.1 Motivations 1.2 Thesis Organization 2. Previous Receivers for Serial-Data Communications 2.1 Classification of the Links 2.2 Clocking architecture of transceivers 2.3 Components of receiver 2.3.1 Channel loss 2.3.2 Equalizer 2.3.3 Clock and data recovery circuit 2.3.3.1. Basic architecture 2.3.3.2. Phase detector 2.3.3.2.1. Linear phase detector 2.3.3.2.2. Binary phase detector 2.3.3.3. Frequency detector 2.3.3.4. Charge pump 2.3.3.5. Voltage controlled oscillator and delay-line 2.3.4 Loop dynamics of PLL 2.3.5 Loop dynamics of DLL 3. The Proposed PLL-Based Receiver with Loop Linearization Technique 3.1 Introduction 3.2 Motivation 3.3 Overview of binary phase detection 3.4 The proposed BBPD linearization technique 3.4.1 Architecture of the proposed PLL-based receiver 3.4.2 Linearization technique of binary phase detection 3.4.3 Rotational pattern of sampling phase offset 3.5 PD gain analysis and optimization 3.6 Loop Dynamics of the 2nd-order CDR 3.7 Verification with the time-accurate behavioral simulation 3.8 Summary 4. The Proposed DLL-Based Receiver with Forwarded-Clock 4.1 Introduction 4.2 Motivation 4.3 Design consideration 4.4 Architecture of the proposed forwarded-clock receiver 4.5 Circuit description 4.5.1 Analog multi-phase DLL 4.5.2 Dual-input interpolating deley cells 4.5.3 Dedicated half-rate data samplers 4.5.4 Cherry-Hooper continuous-time linear equalizer 4.5.5 Equalizer adaptation and phase-lock scheme 4.6 Measurement results 5. Conclusion 6. BibliographyDocto

    Design Techniques for High Performance Serial Link Transceivers

    Get PDF
    Increasing data rates over electrical channels with significant frequency-dependent loss is difficult due to excessive inter-symbol interference (ISI). In order to achieve sufficient link margins at high rates, I/O system designers implement equalization in the transmitters and are motivated to consider more spectrally-efficient modulation formats relative to the common PAM-2 scheme, such as PAM-4 and duobinary. The first work, reviews when to consider PAM-4 and duobinary formats, as the modulation scheme which yields the highest system margins at a given data rate is a function of the channel loss profile, and presents a 20Gb/s triple-mode transmitter capable of efficiently implementing these three modulation schemes and three-tap feedforward equalization. A statistical link modeling tool, which models ISI, crosstalk, random noise, and timing jitter, is developed to compare the three common modulation formats operating on electrical backplane channel models. In order to improve duobinary modulation efficiency, a low-power quarter-rate duobinary precoder circuit is proposed which provides significant timing margin improvement relative to full-rate precoders. Also as serial I/O data rates scale above 10 Gb/s, crosstalk between neighboring channels degrades system bit-error rate (BER) performance. The next work presents receive-side circuitry which merges the cancellation of both near-end and far-end crosstalk (NEXT/FEXT) and can automatically adapt to different channel environments and variations in process, voltage, and temperature. NEXT cancellation is realized with a novel 3-tap FIR filter which combines two traditional FIR filter taps and a continuous-time band-pass filter IIR tap for efficient crosstalk cancellation, with all filter tap coefficients automatically determined via an ondie sign-sign least-mean-square (SS-LMS) adaptation engine. FEXT cancellation is realized by coupling the aggressor signal through a differentiator circuit whose gain is automatically adjusted with a power-detection-based adaptation loop. In conclusion, the proposed architectures in the transmitter side and receiver side together are to be good solution in the high speed I/O serial links to improve the performance by overcome the physical channel loss and adjacent channel noise as the system becomes complicated

    Design of High-Speed SerDes Transceiver for Chip-to-Chip Communications in CMOS Process

    Get PDF
    With the continuous increase of on-chip computation capacities and exponential growth of data-intensive applications, the high-speed data transmission through serial links has become the backbone for modern communication systems. To satisfy the massive data-exchanging requirement, the data rate of such serial links has been updated from several Gb/s to tens of Gb/s. Currently, the commercial standards such as Ethernet 400GbE, InfiniBand high data rate (HDR), and common electrical interface (CEI)-56G has been developing towards 40+ Gb/s. As the core component within these links, the transceiver chipset plays a fundamental role in balancing the operation speed, power consumption, area occupation, and operation range. Meanwhile, the CMOS process has become the dominant technology in modern transceiver chip fabrications due to its large-scale digital integration capability and aggressive pricing advantage. This research aims to explore advanced techniques that are capable of exploiting the maximum operation speed of the CMOS process, and hence provides potential solutions for 40+ Gb/s CMOS transceiver designs. The major contributions are summarized as follows. A low jitter ring-oscillator-based injection-locked clock multiplier (RILCM) with a hybrid frequency tracking loop that consists of a traditional phase-locked loop (PLL), a timing-adjusted loop, and a loop selection state-machine is implemented in 65-nm C-MOS process. In the ring voltage-controlled oscillator, a full-swing pseudo-differential delay cell is proposed to lower the device noise to phase noise conversion. To obtain high operation speed and high detection accuracy, a compact timing-adjusted phase detector tightly combined with a well-matched charge pump is designed. Meanwhile, a lock-loss detection and lock recovery is devised to endow the RILCM with a similar lock-acquisition ability as conventional PLL, thus excluding the initial frequency set- I up aid and preventing the potential lock-loss risk. The experimental results show that the figure-of-merit of the designed RILCM reaches -247.3 dB, which is better than previous RILCMs and even comparable to the large-area LC-ILCMs. The transmitter (TX) and receiver (RX) chips are separately designed and fab- ricated in 65-nm CMOS process. The transmitter chip employs a quarter-rate multi-multiplexer (MUX)-based 4-tap feed-forward equalizer (FFE) to pre-distort the output. To increase the maximum operating speed, a bandwidth-enhanced 4:1 MUX with the capability of eliminating charge-sharing effect is proposed. To produce the quarter-rate parallel data streams with appropriate delays, a compact latch array associated with an interleaved-retiming technique is designed. The receiver chip employs a two-stage continuous-time linear equalizer (CTLE) as the analog front-end and integrates an improved clock data recovery to extract the sampling clocks and retime the incoming data. To automatically balance the jitter tracking and jitter suppression, passive low-pass filters with adaptively-adjusted bandwidth are introduced into the data-sampling path. To optimize the linearity of the phase interpolation, a time-averaging-based compensating phase interpolator is proposed. For equalization, a combined TX-FFE and RX-CTLE is applied to compensate for the channel loss, where a low-cost edge-data correlation-based sign zero-forcing adaptation algorithm is proposed to automatically adjust the TX-FFEโ€™s tap weights. Measurement results show that the fabricated transmitter/receiver chipset can deliver 40 Gb/s random data at a bit error rate of 16 dB loss at the half-baud frequency, while consuming a total power of 370 mW

    Energy-efficient systems for information transfer and processing

    Get PDF
    Machine learning (ML) systems are finding excellent utility in tackling the data deluge of the big data era thanks to the exponential increase in computing power. Current ML systems adopt either centralized cloud computing or distributed edge computing. In both, the challenge of energy efficiency has been drawing increased attention. In cloud computing, data transfer due to inter-chip, inter-board, inter-shelf and inter-rack communications (I/O interface) within data centers is one of the dominant energy costs. This will intensify with the growing demand for increased I/O bandwidth of high-performance computing in data centers. On the other hand, in edge computing, energy efficiency is the primary design challenge, as mobile devices have limited energy, computation and storage resources. This challenge is being exacerbated by the need to embed ML algorithms such as convolutional neural networks (CNNs) for enabling local on-device inference capabilities. In this dissertation, we investigate techniques to address these challenges. To address the energy efficiency challenge in data centers, this dissertation focuses on reducing the energy consumption of the I/O interface. Specifically, in the emerging analog-to-digital converter (ADC)-based multi-Gb/s serial link receivers, the power dissipation is dominated by the ADC. ADCs in serial links employ signal-to-noise-and-distortion-ratio (SNDR) and effective-number-of-bits (ENOB) as performance metrics because these are the standard for generic ADC design. This dissertation presents the use of information-based metrics such as bit-error-rate (BER) to design a BER-optimal ADC (BOA) for serial links. First, theoretical analysis is developed to show when the benefits of BOA over a conventional uniform ADC (CUA) in a serial link receiver are substantial. Second, a \unit[4]{GS/s}, 4-\mbox{\textrm{bit}} on-chip ADC in a \unit[90]{nm} CMOS process is designed and integrated into a 4 Gb/s serial link receiver to verify the aforementioned analysis. Specifically, measured results demonstrate that a 3-\mathrm{bit} BOA receiver outperforms a 4-\mathrm{bit} CUA receiver at a BER <10^{-12} and provides \unit[50]{\%} power savings in the ADC. In the process, it is demonstrated conclusively that BER as opposed to ENOB is a better metric when designing ADCs for serial links. For the problem of resource-constrained computing at the edge, this dissertation tackles the issue of energy-efficient implementation of ML algorithms, particularly CNNs which have recently gained considerable interest due to their record-breaking performance in many recognition tasks. However, their implementation complexity hinders their deployment on power-constrained embedded platforms. This dissertation develops two techniques for energy-efficient CNN design. The first technique is a predictive CNN (PredictiveNet), which makes use of high sparsity in well-trained CNNs to bypass a large fraction of power-dominant convolutions at runtime without modifying the CNN structure. Analysis supported by simulations is provided to justify PredictiveNet's effectiveness. When applied to both the MNIST and CIFAR-10 datasets, simulation results show that PredictiveNet achieves 7.2\times and 4.4\times reduction in the computational and representational costs, respectively, compared with a conventional CNN. It is further shown that PredictiveNet enables computational and representational cost reductions of 2.5\times and 1.7\times, respectively, compared to a state-of-the-art CNN, while incurring only 0.02 classification accuracy loss. The second technique is a variation-tolerant architecture for CNN capable of operating in near threshold voltage (NTV) regime for aggressive energy efficiency. It is well-known that NTV computing can achieve up to 10\times energy savings but is sensitive to process, temperature, and voltage (PVT) variations which can lead to timing errors. To leverage the great potential of NTV for energy efficiency, this dissertation develops a new statistical error compensation (SEC) technique referred to as rank decomposed SEC (RD-SEC). RD-SEC makes use of inherent redundancy in CNNs to handle timing errors due to NTV computing. When evaluated in CNNs for both the MNIST and CIFAR-10 datasets, simulation results in \unit[45]{nm} CMOS show that RD-SEC enables robust CNNs operating in the NTV regime. Specifically, the proposed RD-SEC can achieve up to 11\times improvement in variation tolerance and enable up to 113\times reduction in the standard deviation of classification accuracy while incurring marginal degradation in the median classification accuracy
    • โ€ฆ
    corecore