53 research outputs found

    ์˜คํ”„์…‹ ์ œ๊ฑฐ๊ธฐ์˜ ์ ์‘ ์ œ์–ด ๋“ฑํ™”๊ธฐ์™€ ๋ณด์šฐ-๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋ฅผ ํ™œ์šฉํ•œ ์ˆ˜์‹ ๊ธฐ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021.8. ์—ผ์ œ์™„.In this thesis, designs of high-speed, low-power wireline receivers (RX) are explained. To be specific, the circuit techniques of DC offset cancellation, merged-summer DFE, stochastic Baud-rate CDR, and the phase detector (PD) for multi-level signal are proposed. At first, an RX with adaptive offset cancellation (AOC) and merged summer decision-feedback equalizer (DFE) is proposed. The proposed AOC engine removes the random DC offset of the data path by examining the random data stream's sampled data and edge outputs. In addition, the proposed RX incorporates a shared-summer DFE in a half-rate structure to reduce power dissipation and hardware complexity of the adaptive equalizer. A prototype chip fabricated in 40 nm CMOS technology occupies an active area of 0.083 mm2. Thanks to the AOC engine, the proposed RX achieves the BER of less than 10-12 in a wide range of data rates: 1.62-10 Gb/s. The proposed RX consumes 18.6 mW at 10 Gb/s over a channel with a 27 dB loss at 5 GHz, exhibiting a figure-of-merit of 0.068 pJ/b/dB. Secondly, a 40 nm CMOS RX with Baud-rate phase-detector (BRPD) is proposed. The RX includes two PDs: the BRPD employing the stochastic technique and the BRPD suitable for multi-level signals. Thanks to the Baud-rate CDRโ€™s advantage, by not using an edge-sampling clock, the proposed CDR can reduce the power consumption by lowering the hardware complexity. Besides, the proposed stochastic phase detector (SPD) tracks an optimal phase-locking point that maximizes the vertical eye opening. Furthermore, despite residual inter-symbol interference, proposed BRPD for multi-level signal secures vertical eye margin, which is especially vulnerable in the multi-level signal. Besides, the proposed BRPD has a unique lock point with an adaptive DFE, unlike conventional Mueller-Muller PD. A prototype chip fabricated in 40 nm CMOS technology occupies an active area of 0.24 mm2. The proposed PAM-4 RX achieves the bit-error-rate less than 10-11 in 48 Gb/s and the power efficiency of 2.42 pJ/b.๋ณธ ๋…ผ๋ฌธ์€ ๊ณ ์†, ์ €์ „๋ ฅ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ์œ ์„  ์ˆ˜์‹ ๊ธฐ์˜ ์„ค๊ณ„์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜๊ณ  ์žˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ๋งํ•˜๋ฉด, ์˜คํ”„์…‹ ์ƒ์‡„, ๋ณ‘ํ•ฉ๋œ ์„œ๋จธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ๋“ฑํ™”๊ธฐ ๊ธฐ์ˆ , ํ™•๋ฅ ์  ๋ณด์šฐ ๋ ˆ์ดํŠธ ํด๋Ÿญ๊ณผ ๋ฐ์ดํ„ฐ ๋ณต์›๊ธฐ, ๊ทธ๋ฆฌ๊ณ  ๋‹ค์ค‘ ๋ ˆ๋ฒจ ์‹ ํ˜ธ์— ์ ํ•ฉํ•œ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ์งธ๋กœ, ์ ์‘ ์˜คํ”„์…‹ ์ œ๊ฑฐ ๋ฐ ๋ณ‘ํ•ฉ๋œ ์„œ๋จธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ๋“ฑํ™”๊ธฐ๋ฅผ ๊ฐ–์ถ˜ ์ˆ˜์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ ์ ์‘ ์˜คํ”„์…‹ ์ œ๊ฑฐ ์—”์ง„์€ ์ž„์˜์˜ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ์˜ ์ƒ˜ํ”Œ๋ง ๋ฐ์ดํ„ฐ, ์—์ง€ ์ถœ๋ ฅ์„ ๊ฒ€์‚ฌํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๊ฒฝ๋กœ ์ƒ์˜ ์˜คํ”„์…‹์„ ์ œ๊ฑฐํ•œ๋‹ค. ๋˜ํ•œ ํ•˜ํ”„ ๋ ˆ์ดํŠธ ๊ตฌ์กฐ์˜ ๋ณ‘ํ•ฉ๋œ ์„œ๋จธ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ๋“ฑํ™”๊ธฐ๋Š” ์ „๋ ฅ์˜ ์‚ฌ์šฉ๊ณผ ํ•˜๋“œ์›จ์–ด์˜ ๋ณต์žก์„ฑ์„ ์ค„์ธ๋‹ค. 40 nm CMOS ๊ธฐ์ˆ ๋กœ ์ œ์ž‘๋œ ํ”„๋กœํ† ํƒ€์ž… ์นฉ์€ 0.083 mm2 ์˜ ๋ฉด์ ์„ ๊ฐ€์ง„๋‹ค. ์ ์‘ ์˜คํ”„์…‹ ์ œ๊ฑฐ๊ธฐ ๋•๋ถ„์— ์ œ์•ˆ๋œ ์ˆ˜์‹ ๊ธฐ๋Š” 10-12 ๋ฏธ๋งŒ์˜ BER์„ ๋‹ฌ์„ฑํ•œ๋‹ค. ๋˜ํ•œ ์ œ์•ˆ๋œ ์ˆ˜์‹ ๊ธฐ๋Š” 5GHz์—์„œ 27 dB์˜ ๋กœ์Šค๋ฅผ ๊ฐ–๋Š” ์ฑ„๋„์—์„œ 10 Gb/s์˜ ์†๋„์—์„œ 18.6 mW๋ฅผ ์†Œ๋น„ํ•˜๋ฉฐ 0.068 pJ/b/dB์˜ FoM์„ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ๋‘๋ฒˆ์งธ๋กœ, ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๊ฐ€ ์žˆ๋Š” 40 nm CMOS ์ˆ˜์‹ ๊ธฐ๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ˆ˜์‹ ๊ธฐ์—๋Š” ๋‘๊ฐœ์˜ ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋ฅผ ํฌํ•จํ•œ๋‹ค. ํ•˜๋‚˜๋Š” ํ™•๋ฅ ๋ก ์  ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋Š” ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ์ด๋‹ค. ๋ณด์šฐ ๋ ˆ์ดํŠธ ํด๋Ÿญ ๋ฐ์ดํ„ฐ ๋ณต์›๊ธฐ์˜ ์žฅ์  ๋•๋ถ„์— ์—์ง€ ์ƒ˜ํ”Œ๋ง ํด๋Ÿญ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ์œผ๋กœ์„œ ํŒŒ์›Œ์˜ ์†Œ๋ชจ์™€ ํ•˜๋“œ์›จ์–ด์˜ ๋ณต์žก์„ฑ์„ ์ค„์˜€๋‹ค. ๋˜ํ•œ ํ™•๋ฅ ์  ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋Š” ์ˆ˜์ง ์•„์ด ์˜คํ”„๋‹์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ตœ์ ์˜ ์œ„์ƒ ์ง€์ ์„ ์ฐพ์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‹ค๋ฅธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋Š” ๋‹ค์ค‘ ๋ ˆ๋ฒจ ์‹ ํ˜ธ์— ์ ํ•ฉํ•œ ๋ฐฉ์‹์ด๋‹ค. ์‹ฌ๋ณผ ๊ฐ„ ๊ฐ„์„ญ์ด ๋‹ค์ค‘ ๋ ˆ๋ฒจ ์‹ ํ˜ธ์— ๋งค์šฐ ์ทจ์•ฝํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ๋”๋ผ๋„ ์ œ์•ˆ๋œ ๋‹ค์ค‘ ๋ ˆ๋ฒจ ์‹ ํ˜ธ์šฉ ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋Š” ์ˆ˜์ง ์•„์ด ๋งˆ์ง„์„ ํ™•๋ณดํ•œ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ์ œ์•ˆ๋œ ๋ณด์šฐ ๋ ˆ์ดํŠธ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋Š” ๊ธฐ์กด์˜ ๋ฎฌ๋Ÿฌ-๋ฎ๋Ÿฌ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ์™€ ๋‹ฌ๋ฆฌ ์ ์‘ํ˜• ๊ฒฐ์ • ํ”ผ๋“œ๋ฐฑ ๋“ฑํ™”๊ธฐ๊ฐ€ ์žˆ๋”๋ผ๋„ ์œ ์ผํ•œ ๋ฝ ์ง€์ ์„ ๊ฐ–๋Š”๋‹ค. ํ”„๋กœํ† ํƒ€์ž… ์นฉ์€ 0.24mm2์˜ ๋ฉด์ ์„ ๊ฐ€์ง„๋‹ค. ์ œ์•ˆ๋œ PAM-4 ์ˆ˜์‹ ๊ธฐ๋Š” 48 Gb/s์˜ ์†๋„์—์„œ 10-11 ๋ฏธ๋งŒ์˜ BER์„ ๊ฐ€์ง€๊ณ , 2.42 pJ/b์˜ FoM์„ ๊ฐ€์ง„๋‹ค.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 5 CHAPTER 2 BACKGROUNDS 6 2.1 BASIC ARCHITECTURE IN SERIAL LINK 6 2.1.1 SERIAL COMMUNICATION 6 2.1.2 CLOCK AND DATA RECOVERY 8 2.1.3 MULTI-LEVEL PULSE-AMPLITUDE MODULATION 10 2.2 EQUALIZER 12 2.2.1 EQUALIZER OVERVIEW 12 2.2.2 DECISION-FEEDBACK EQUALIZER 15 2.2.3 ADAPTIVE EQUALIZER 18 2.3 CLOCK RECOVERY 21 2.3.1 2X OVERSAMPLING PD ALEXANDER PD 22 2.3.2 BAUD-RATE PD MUELLER MULLER PD 25 CHAPTER 3 AN ADAPTIVE OFFSET CANCELLATION SCHEME AND SHARED SUMMER ADAPTIVE DFE 28 3.1 OVERVIEW 28 3.2 AN ADAPTIVE OFFSET CANCELLATION SCHEME AND SHARED-SUMMER ADAPTIVE DFE FOR LOW POWER RECEIVER 31 3.3 SHARED SUMMER DFE 37 3.4 RECEIVER IMPLEMENTATION 42 3.5 MEASUREMENT RESULTS 45 CHAPTER 4 PAM-4 BAUD-RATE DIGITAL CDR 51 4.1 OVERVIEW 51 4.2 OVERALL ARCHITECTURE 53 4.2.1 PROPOSED BAUD-RATE CDR ARCHITECTURE 53 4.2.2 PROPOSED ANALOG FRONT-END STRUCTURE 59 4.3 STOCHASTIC PHASE DETECTION PAM-4 CDR 64 4.3.1 PROPOSED STOCHASTIC PHASE DETECTION 64 4.3.2 COMPARISON OF THE STOCHASTIC PD WITH SS-MMPD 70 4.4 PHASE DETECTION FOR MULTI-LEVEL SIGNALING 73 4.4.1 PROPOSED BAUD-RATE PHASE DETECTOR FOR MULTI-LEVEL SIGNAL 73 4.4.2 DATA LEVEL AND DFE COEFFICIENT ADAPTATION 79 4.4.3 PROPOSED PHASE DETECTOR 84 4.5 MEASUREMENT RESULT 88 4.5.1 MEASUREMENT OF THE PROPOSED STOCHASTIC BAUD-RATE PHASE DETECTION 94 4.5.2 MEASUREMENT OF THE PROPOSED BAUD-RATE PHASE DETECTION FOR MULTI-LEVEL SIGNAL 97 CHAPTER 5 CONCLUSION 103 BIBLIOGRAPHY 105 ์ดˆ ๋ก 109๋ฐ•

    ๋ฉ”๋ชจ๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์œ„ํ•œ 20Gbps๊ธ‰ ์ง๋ ฌํ™” ์†ก์ˆ˜์‹ ๊ธฐ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2013. 8. ์ •๋•๊ท .Various types of serial link for current and future memory interface are presented in this thesis. At first, PHY design for commercial GDDR3 memory is proposed. GDDR3 PHY is consists of read path, write path, command path. Write path and command path calibrate skew by using VDL (Variable delay line), while read path calibrates skew by using DLL (Delay locked loop) and VDL. There are four data channels and one command/address channel. Each data channel consists of one clock signal (DQS) and eight data signals (DQ). Data channel operates in 1.2Gbps (1.08Gbps~1.2Gbps), and command/address channel operates 600Mbps (540Mbps~600Mbps). In particular, DLL design for high speed and for SSN (simultaneous switching noise) is concentrated in this thesis. Secondly, serial link design for silicon photonics is proposed. Silicon photonics is the strongest candidate for next generation memory interface. Modulator driver for modulator, TIA (trans-impedance amplifier) and LA (limiting amplifier) for photo diode design are discussed. It operates above 12.5Gbps but it consumes much power 7.2mW/Gbps (transmitter core), 2mW/Gbps (receiver core) because it is connected with optical device which has large parasitic capacitance. Overall receiver which includes CDR (clock and data recovery) is also implemented. Many chips are fabricated in 65nm, 0.13um CMOS process. Finally, electrical serial link for 20Gbps memory link is proposed. Overall architecture is forwarded clocking architecture, and is very simple and intuitive. It does not need additional synchronizer. This open loop delay matched stream line receiver finds optimum sampling point with DCDL (Digitally controlled delay line) controller and expects to consume low power structurally. Only two phase half rate clock is transmitted through clock channel, but half rate time interleaved way sampling is performed by aid of initial value settable PRBS chaser. A CMOS Chip is fabricated by 65nm process and it occupies 2500um x 2500um (transceiver). It is expected that about 2.6mW(2.4mW)/Gbps (transmitter), 4.1mW(2.7mW)/Gbps (receiver). Power consumption improvement is expected in advanced process.ABSTRACT I CONTENTS V LIST OF FIGURES VII LIST OF TABLES XII CHAPTER 1 INTRODUCTION ๏ผ‘ 1.1 MOTIVATION ๏ผ‘ 1.2 THESIS ORGANIZATION ๏ผ‘๏ผ CHAPTER 2 A SERIAL LINK PHY DESIGN FOR GDDR3 MEMORY INTERFACE 11 2.1 INTRODUCTION 11 2.2 GDDR3 MEMORY INTERFACE ARCHITECTURE 12 2.2.1 READ PATH ARCHITECTURE 15 2.2.2 WRITE PATH ARCHITECTURE 17 2.2.3 COMMAND PATH ARCHITECTURE 19 2.3 DLL DESIGN FOR MEMORY INTERFACE 20 2.3.1 SSN(SIMULTANEOUS SWITCHING NOISE) 20 2.3.2 DLL ARCHITECTURE 21 2.3.3 VOLTAGE CONTROLLED DELAY LINE (VCDL) 22 2.3.4 HYSTERESIS COARSE LOCK DETECTOR (HCLD) 23 2.3.5 DYNAMIC PHASE DETECTOR AND CHARGE PUMP 26 2.4 SIMULATION RESULT 29 2.5 CONCLUSION 32 CHAPTER 3 OPTICAL FRONT-END SERIAL LINK DESIGN FOR 20 GBPS MEMORY INTERFACE 35 3.1 SILICON PHOTONICS INTRODUCTION 35 3.2 OPTICAL FRONT-END TRANSMITTER DESIGN 45 3.2.1 MODULATOR DRIVER REQUIREMENTS 46 3.2.2 MODULATOR DRIVER DESIGN - CURRENT MODE DRIVER 47 3.2.3 MODULATOR DRIVER DESIGN - CURRENT MODE DRIVER 50 3.3 OPTICAL FRONT-END RECEIVER DESIGN 55 3.3.1 OPTICAL RECEIVER BACK END REQUIREMENTS 56 3.3.2 OPTICAL RECEIVER BACK END DESIGN โ€“ TIA 57 3.3.3 OPTICAL RECEIVER BACK END DESIGN โ€“ LA, DRIVER 63 3.3.4 OPTICAL RECEIVER BACK END DESIGN โ€“ CDR 66 3.4 MEASUREMENT AND SIMULATION RESULTS 70 3.4.1 MEASUREMENT AND SIMULATION ENVIRONMENTS 70 3.4.2 OPTICAL TX FRONT END MEASUREMENT AND SIMULATION 74 3.4.3 OPTICAL RX FRONT END MEASUREMENT AND SIMULATION 77 3.4.4 OPTICAL RX BACK END SIMULATION 79 3.4.5 OPTICAL-ELECTRICAL OVERALL MEASUREMENTS 80 3.4.6 DIE PHOTO AND LAYOUT 82 3.5 CONCLUSION 86 CHAPTER 4 ELECTRICAL FRONT-END SERIAL LINK DESIGN FOR 20GBPS MEMORY INTERFACE 87 4.1 INTRODUCTION 87 4.2 CONVENTIONAL ELECTRICAL FRONT-END HIGH SPEED SERIAL LINK ARCHITECTURES 90 4.3 DESIGN CONCEPT AND PROPOSED SERIAL LINK ARCHITECTURE โ€“ OPEN LOOP DELAY MATCHED STREAM LINED RECEIVER. 95 4.3.1 PROPOSED OVERALL ARCHITECTURE 95 4.3.2 DESIGN CONCEPT 97 4.3.3 PROPOSED PROTOCOL AND LOCKING PROCESS 100 4.4 OPTIMUM POINT SEARCH ALGORITHM BASED DCDL CONTROLLER DESIGN 102 4.5 DCDL (DIGITALLY CONTROLLED DELAY LINE) DESIGN 112 4.6 DFE (DECISION FEEDBACK EQUALIZER) AND OTHER BLOCKS DESIGN 115 4.7 SIMULATION RESULTS 117 4.8 POWER EXPECTATION AND CHIP LAYOUT 122 4.9 CONCLUSION 124 CHAPTER 5 CONCLUSION 126 BIBLIOGRAPHY 128Docto

    Architectural & circuit level techniques to improve energy efficiency of high speed serial links

    Get PDF
    High performance computing and communication are two key aspects of all information processing systems. With aggressive scaling of silicon technology enabling integration of a large number of transistors in a small area, managing power and thermal reliability has become very challenging. While lowering the power needed for performing computation has been the prime focus for decades, energy consumed for data transfer has recently become a major bottleneck especially in high performance applications. The focus of this thesis is on improving energy efficiency of communication links by exploring design techniques at both the architectural and circuit levels. In the first part of this work, we propose a time-based equalization scheme to implement transmit de-emphasis in voltage-mode output drivers. Using two-level pulse-width modulation, it overcomes the tradeoff between impedance matching, output swing, and de-emphasis resolution in conventional voltage-mode drivers. A prototype PWM-based 5โ€‰\,Gb/s voltage-mode transmitter was implemented in a 90โ€‰\,nm CMOS process and characterized across different channels and output swings to demonstrate the effectiveness of proposed techniques. The horizontal/vertical eye openings (BER=10โˆ’12\rm 10^{-12}) at the ends of 60โ€‰\,inch and 96โ€‰\,inch stripline channels are 78โ€‰\,mV/0.6โ€‰\,UI and 8โ€‰\,mV/0.3โ€‰\,UI, respectively. This transmitter achieves an energy efficiency of 3.1โ€‰\,mW/Gb/s while compensating for 16-28โ€‰\,dB channel loss, which compares favorably with the state-of-the-art. In the second part, techniques to improve energy efficiency of a complete transceiver are presented. The transmitter employs a novel partially segmented voltage-mode output driver to lower power consumption in pre-drivers during 2-tap FIR equalization. The receiver implements a low power half-rate clock and data recovery with the proposed ring PLL based multi-phase sampling clock generation in CDR loop and charge-based sampling and deserialization. These techniques are verified using the measured results obtained from a 14Gb/s transceiver prototype. Transmitter achieves an energy efficiency of 0.89โ€‰\,mW/Gb/s while securing a 0.36โ€‰\,UI sampling time margin with BER=10โˆ’12\rm{BER=10^{-12}} at the end of the channel with 11โ€‰\,dB loss at Nyquist frequency. The receiver recovers sampling clock with 1.8โ€‰\,psrms\rm{ps_{rms}} long term absolute jitter while recovering 14โ€‰\,Gb/s data at BER=10โˆ’12\rm{BER=10^{-12}}. The receiver achieves an energy efficiency of 1.69โ€‰\,mW/Gb/s. Transmitter and receiver share an LC PLL, which achieves 0.605โ€‰\,psrms\rm{ps_{rms}} integrated jitter at 7โ€‰\,GHz output with an energy efficiency of 0.5โ€‰\,mW/GHz. The transceiver as a whole achieves an energy efficiency of 2.8โ€‰\,mW/Gb/s

    Digital Centric Multi-Gigabit SerDes Design and Verification

    Get PDF
    Advances in semiconductor manufacturing still lead to ever decreasing feature sizes and constantly allow higher degrees of integration in application specific integrated circuits (ASICs). Therefore the bandwidth requirements on the external interfaces of such systems on chips (SoC) are steadily growing. Yet, as the number of pins on these ASICs is not increasing in the same pace - known as pin limitation - the bandwidth per pin has to be increased. SerDes (Serializer/Deserializer) technology, which allows to transfer data serially at very high data rates of 25Gbps and more is a key technology to overcome pin limitation and exploit the computing power that can be achieved in todays SoCs. As such SerDes blocks together with the digital logic interfacing them form complex mixed signal systems, verification of performance and functional correctness is very challenging. In this thesis a novel mixed-signal design methodology is proposed, which tightly couples model and implementation in order to ensure consistency throughout the design cycles and hereby accelerate the overall implementation flow. A tool flow that has been developed is presented, which integrates well into state of the art electronic design automation (EDA) environments and enables the usage of this methodology in practice. Further, the design space of todays high-speed serial links is analyzed and an architecture is proposed, which pushes complexity into the digital domain in order to achieve robustness, portability between manufacturing processes and scaling with advanced node technologies. The all digital phase locked loop (PLL) and clock data recovery (CDR), which have been developed are described in detail. The developed design flow was used for the implementation of the SerDes architecture in a 28nm silicon process and proved to be indispensable for future projects

    ์ฐจ์„ธ๋Œ€ HBM ์šฉ ๊ณ ์ง‘์ , ์ €์ „๋ ฅ ์†ก์ˆ˜์‹ ๊ธฐ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ์ •๋•๊ท .This thesis presents design techniques for high-density power-efficient transceiver for the next-generation high bandwidth memory (HBM). Unlike the other memory interfaces, HBM uses a 3D-stacked package using through-silicon via (TSV) and a silicon interposer. The transceiver for HBM should be able to solve the problems caused by the 3D-stacked package and TSV. At first, a data (DQ) receiver for HBM with a self-tracking loop that tracks a phase skew between DQ and data strobe (DQS) due to a voltage or thermal drift is proposed. The self-tracking loop achieves low power and small area by uti-lizing an analog-assisted baud-rate phase detector. The proposed pulse-to-charge (PC) phase detector (PD) converts the phase skew to a voltage differ-ence and detects the phase skew from the voltage difference. An offset calibra-tion scheme that can compensates for a mismatch of the PD is also proposed. The proposed calibration scheme operates without any additional sensing cir-cuits by taking advantage of the write training of HBM. Fabricated in 65 nm CMOS, the DQ receiver shows a power efficiency of 370 fJ/b at 4.8 Gb/s and occupies 0.0056 mm2. The experimental results show that the DQ receiver op-erates without any performance degradation under a ยฑ 10% supply variation. In a second prototype IC, a high-density transceiver for HBM with a feed-forward-equalizer (FFE)-combined crosstalk (XT) cancellation scheme is pre-sented. To compensate for the XT, the transmitter pre-distorts the amplitude of the FFE output according to the XT. Since the proposed XT cancellation (XTC) scheme reuses the FFE implemented to equalize the channel loss, additional circuits for the XTC is minimized. Thanks to the XTC scheme, a channel pitch can be significantly reduced, allowing for the high channel density. Moreover, the 3D-staggered channel structure removes the ground layer between the verti-cally adjacent channels, which further reduces a cross-sectional area of the channel per lane. The test chip including 6 data lanes is fabricated in 65 nm CMOS technology. The 6-mm channels are implemented on chip to emulate the silicon interposer between the HBM and the processor. The operation of the XTC scheme is verified by simultaneously transmitting 4-Gb/s data to the 6 consecutive channels with 0.5-um pitch and the XTC scheme reduces the XT-induced jitter up to 78 %. The measurement result shows that the transceiver achieves the throughput of 8 Gb/s/um. The transceiver occupies 0.05 mm2 for 6 lanes and consumes 36.6 mW at 6 x 4 Gb/s.๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ฐจ์„ธ๋Œ€ HBM์„ ์œ„ํ•œ ๊ณ ์ง‘์  ์ €์ „๋ ฅ ์†ก์ˆ˜์‹ ๊ธฐ ์„ค๊ณ„ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ์ „์•• ๋ฐ ์˜จ๋„ ๋ณ€ํ™”์— ์˜ํ•œ ๋ฐ์ดํ„ฐ์™€ ํด๋Ÿญ ๊ฐ„ ์œ„์ƒ ์ฐจ์ด๋ฅผ ๋ณด์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์ž์ฒด ์ถ”์  ๋ฃจํ”„๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ ์ˆ˜์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์ž์ฒด ์ถ”์  ๋ฃจํ”„๋Š” ๋ฐ์ดํ„ฐ ์ „์†ก ์†๋„์™€ ๊ฐ™์€ ์†๋„๋กœ ๋™์ž‘ํ•˜๋Š” ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ „๋ ฅ ์†Œ๋ชจ์™€ ๋ฉด์ ์„ ์ค„์˜€๋‹ค. ๋˜ํ•œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์“ฐ๊ธฐ ํ›ˆ๋ จ (write training) ๊ณผ์ •์„ ์ด์šฉํ•˜์—ฌ ํšจ๊ณผ์ ์œผ๋กœ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ์˜ ์˜คํ”„์…‹์„ ๋ณด์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ˆ˜์‹ ๊ธฐ๋Š” 65 nm ๊ณต์ •์œผ๋กœ ์ œ์ž‘๋˜์–ด 4.8 Gb/s์—์„œ 370 fJ/b์„ ์†Œ๋ชจํ•˜์˜€๋‹ค. ๋˜ํ•œ 10 % ์˜ ์ „์•• ๋ณ€ํ™”์— ๋Œ€ํ•˜์—ฌ ์•ˆ์ •์ ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ํ”ผ๋“œ ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ €์™€ ๊ฒฐํ•ฉ๋œ ํฌ๋กœ์Šค ํ† ํฌ ๋ณด์ƒ ๋ฐฉ์‹์„ ํ™œ์šฉํ•œ ๊ณ ์ง‘์  ์†ก์ˆ˜์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์†ก์‹ ๊ธฐ๋Š” ํฌ๋กœ์Šค ํ† ํฌ ํฌ๊ธฐ์— ํ•ด๋‹นํ•˜๋Š” ๋งŒํผ ์†ก์‹ ๊ธฐ ์ถœ๋ ฅ์„ ์™œ๊ณกํ•˜์—ฌ ํฌ๋กœ์Šค ํ† ํฌ๋ฅผ ๋ณด์ƒํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ํฌ๋กœ์Šค ํ† ํฌ ๋ณด์ƒ ๋ฐฉ์‹์€ ์ฑ„๋„ ์†์‹ค์„ ๋ณด์ƒํ•˜๊ธฐ ์œ„ํ•ด ๊ตฌํ˜„๋œ ํ”ผ๋“œ ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ €๋ฅผ ์žฌํ™œ์šฉํ•จ์œผ๋กœ์จ ์ถ”๊ฐ€์ ์ธ ํšŒ๋กœ๋ฅผ ์ตœ์†Œํ™”ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์†ก์ˆ˜์‹ ๊ธฐ๋Š” ํฌ๋กœ์Šค ํ† ํฌ๊ฐ€ ๋ณด์ƒ ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ฑ„๋„ ๊ฐ„๊ฒฉ์„ ํฌ๊ฒŒ ์ค„์—ฌ ๊ณ ์ง‘์  ํ†ต์‹ ์„ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ๋˜ํ•œ ์ง‘์ ๋„๋ฅผ ๋” ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์„ธ๋กœ๋กœ ์ธ์ ‘ํ•œ ์ฑ„๋„ ์‚ฌ์ด์˜ ์ฐจํ ์ธต์„ ์ œ๊ฑฐํ•œ ์ ์ธต ์ฑ„๋„ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. 6๊ฐœ์˜ ์†ก์ˆ˜์‹ ๊ธฐ๋ฅผ ํฌํ•จํ•œ ํ”„๋กœํ† ํƒ€์ž… ์นฉ์€ 65 nm ๊ณต์ •์œผ๋กœ ์ œ์ž‘๋˜์—ˆ๋‹ค. HBM๊ณผ ํ”„๋กœ์„ธ์„œ ์‚ฌ์ด์˜ silicon interposer channel ์„ ๋ชจ์‚ฌํ•˜๊ธฐ ์œ„ํ•œ 6 mm ์˜ ์ฑ„๋„์ด ์นฉ ์œ„์— ๊ตฌํ˜„๋˜์—ˆ๋‹ค. ์ œ์•ˆํ•˜๋Š” ํฌ๋กœ์Šค ํ† ํฌ ๋ณด์ƒ ๋ฐฉ์‹์€ 0.5 um ๊ฐ„๊ฒฉ์˜ 6๊ฐœ์˜ ์ธ์ ‘ํ•œ ์ฑ„๋„์— ๋™์‹œ์— ๋ฐ์ดํ„ฐ๋ฅผ ์ „์†กํ•˜์—ฌ ๊ฒ€์ฆ๋˜์—ˆ์œผ๋ฉฐ, ํฌ๋กœ์Šค ํ† ํฌ๋กœ ์ธํ•œ ์ง€ํ„ฐ๋ฅผ ์ตœ๋Œ€ 78 % ๊ฐ์†Œ์‹œ์ผฐ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์†ก์ˆ˜์‹ ๊ธฐ๋Š” 8 Gb/s/um ์˜ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ๊ฐ€์ง€๋ฉฐ 6 ๊ฐœ์˜ ์†ก์ˆ˜์‹ ๊ธฐ๊ฐ€ ์ด 36.6 mW์˜ ์ „๋ ฅ์„ ์†Œ๋ชจํ•˜์˜€๋‹ค.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 4 CHAPTER 2 BACKGROUND ON HIGH-BANDWIDTH MEMORY 6 2.1 OVERVIEW 6 2.2 TRANSCEIVER ARCHITECTURE 10 2.3 READ/WRITE OPERATION 15 2.3.1 READ OPERATION 15 2.3.2 WRITE OPERATION 19 CHAPTER 3 BACKGROUNDS ON COUPLED WIRES 21 3.1 GENERALIZED MODEL 21 3.2 EFFECT OF CROSSTALK 26 CHAPTER 4 DQ RECEIVER WITH BAUD-RATE SELF-TRACKING LOOP 29 4.1 OVERVIEW 29 4.2 FEATURES OF DQ RECEIVER FOR HBM 33 4.3 PROPOSED PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.1 OPERATION OF PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.2 OFFSET CALIBRATION 37 4.3.3 OPERATION SEQUENCE 39 4.4 CIRCUIT IMPLEMENTATION 42 4.5 MEASUREMENT RESULT 46 CHAPTER 5 HIGH-DENSITY TRANSCEIVER FOR HBM WITH 3D-STAGGERED CHANNEL AND CROSSTALK CANCELLATION SCHEME 57 5.1 OVERVIEW 57 5.2 PROPOSED 3D-STAGGERED CHANNEL 61 5.2.1 IMPLEMENTATION OF 3D-STAGGERED CHANNEL 61 5.2.2 CHANNEL CHARACTERISTICS AND MODELING 66 5.3 PROPOSED FEED-FORWARD-EQUALIZER-COMBINED CROSSTALK CANCELLATION SCHEME 72 5.4 CIRCUIT IMPLEMENTATION 77 5.4.1 OVERALL ARCHITECTURE 77 5.4.2 TRANSMITTER WITH FFE-COMBINED XTC 79 5.4.3 RECEIVER 81 5.5 MEASUREMENT RESULT 82 CHAPTER 6 CONCLUSION 93 BIBLIOGRAPHY 95 ์ดˆ ๋ก 102Docto

    ๋ฐ์ดํ„ฐ ์ „์†ก๋กœ ํ™•์žฅ์„ฑ๊ณผ ๋ฃจํ”„ ์„ ํ˜•์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚จ ๋‹ค์ค‘์ฑ„๋„ ์ˆ˜์‹ ๊ธฐ๋“ค์— ๊ด€ํ•œ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2013. 2. ์ •๋•๊ท .Two types of serial data communication receivers that adopt a multichannel architecture for a high aggregate I/O bandwidth are presented. Two techniques for collaboration and sharing among channels are proposed to enhance the loop-linearity and channel-expandability of multichannel receivers, respectively. The first proposed receiver employs a collaborative timing scheme recovery which relies on the sharing of all outputs of phase detectors (PDs) among channels to extract common information about the timing and multilevel signaling architecture of PAM-4. The shared timing information is processed by a common global loop filter and is used to update the phase of the voltage-controlled oscillator with better rejection of per-channel noise. In addition to collaborative timing recovery, a simple linearization technique for binary PDs is proposed. The technique realizes a high-rate oversampling PD while the hardware cost is equivalent to that of a conventional 2x-oversampling clock and data recovery. The first receiver exploiting the collaborative timing recovery architecture is designed using 45-nm CMOS technology. A single data lane occupies a 0.195-mm2 area and consumes a relatively low 17.9 mW at 6 Gb/s at 1.0V. Therefore, the power efficiency is 2.98 mW/Gb/s. The simulated jitter is about 0.034 UI RMS given an input jitter value of 0.03 UI RMS, while the relatively constant loop bandwidth with the PD linearization technique is about 7.3-MHz regardless of the data-stream noise. Unlike the first receiver, the second proposed multichannel receiver was designed to reduce the hardware complexity of each lane. The receiver employs shared calibration logic among channels and yet achieves superior channel expandability with slim data lanes. A shared global calibration control, which is used in a forwarded clock receiver based on a multiphase delay-locked loop, accomplishes skew calibration, equalizer adaptation, and the phase lock of all channels during a calibration period, resulting in reduced hardware overhead and less area required by each data lane. The second forwarded clock receiver is designed in 90-nm CMOS technology. It achieves error-free eye openings of more than 0.5 UI across 9โˆ’ 28 inch Nelco 4000-6 microstrips at 4โˆ’ 7 Gb/s and more than 0.42 UI at data rates of up to 9 Gb/s. The data lane occupies only 0.152 mm2 and consumes 69.8 mW, while the rest of the receiver occupies 0.297 mm2 and consumes 56 mW at a data rate of 7 Gb/s and a supply voltage of 1.35 V.1. Introduction 1 1.1 Motivations 1.2 Thesis Organization 2. Previous Receivers for Serial-Data Communications 2.1 Classification of the Links 2.2 Clocking architecture of transceivers 2.3 Components of receiver 2.3.1 Channel loss 2.3.2 Equalizer 2.3.3 Clock and data recovery circuit 2.3.3.1. Basic architecture 2.3.3.2. Phase detector 2.3.3.2.1. Linear phase detector 2.3.3.2.2. Binary phase detector 2.3.3.3. Frequency detector 2.3.3.4. Charge pump 2.3.3.5. Voltage controlled oscillator and delay-line 2.3.4 Loop dynamics of PLL 2.3.5 Loop dynamics of DLL 3. The Proposed PLL-Based Receiver with Loop Linearization Technique 3.1 Introduction 3.2 Motivation 3.3 Overview of binary phase detection 3.4 The proposed BBPD linearization technique 3.4.1 Architecture of the proposed PLL-based receiver 3.4.2 Linearization technique of binary phase detection 3.4.3 Rotational pattern of sampling phase offset 3.5 PD gain analysis and optimization 3.6 Loop Dynamics of the 2nd-order CDR 3.7 Verification with the time-accurate behavioral simulation 3.8 Summary 4. The Proposed DLL-Based Receiver with Forwarded-Clock 4.1 Introduction 4.2 Motivation 4.3 Design consideration 4.4 Architecture of the proposed forwarded-clock receiver 4.5 Circuit description 4.5.1 Analog multi-phase DLL 4.5.2 Dual-input interpolating deley cells 4.5.3 Dedicated half-rate data samplers 4.5.4 Cherry-Hooper continuous-time linear equalizer 4.5.5 Equalizer adaptation and phase-lock scheme 4.6 Measurement results 5. Conclusion 6. BibliographyDocto

    Power-Proportional Optical Links

    Get PDF
    The continuous increase in data transfer rate in short-reach links, such as chip-to-chip and between servers within a data-center, demands high-speed links. As power efficiency becomes ever more important in these links, power-efficient optical links need to be designed. Power efficiency in a link can be achieved by enabling power-proportional communication over the serial link. In power-proportional links, the power dissipated by a link is proportional to the amount of data communicated. Normally, data-rate demand is not constant, and the peak data-rate is not required all the time. If a link is not adapted according to the data-rate demand, there will be a fixed power dissipation, and the power efficiency of the link will degrade during the sub-maximal link utilization. Adapting links to real-time data-rate requirements reduces power dissipation. Power proportionality is achieved by scaling the power of the serial link linearly with the link utilization, and techniques such as variable data-rate and burst-mode can be adopted for this purpose. Links whose data rate (and hence power dissipation) can be varied in response to system demands are proposed in this work. Past works have presented rapidly reconfigurable bandwidth in variable data-rate receivers, allowing lower power dissipation for lower data-rate operation. However, maintaining synchronization during reconfiguration was not possible since previous approaches have introduced changes in front-end delay when they are reconfigured. This work presents a technique that allows rapid bandwidth adjustment while maintaining a near-constant delay through the receiver suitable for a power-scalable variable data-rate optical link. Measurements of a fabricated integrated circuit (IC) show nearly constant energy per bit across a 2ร— variation in data rate while introducing less than 10 % of a unit interval (UI) of delay variation. With continuously increasing data communication in data-centers, parallel optical links with ever-increasing per-lane data rates are being used to meet overall throughput demands. Simultaneously, power efficiency is becoming increasingly important for these links since they do not transmit useful data all the time. The burst-mode solution for vertical-cavity surface-emitting laser (VCSEL)-based point-to-point communication can be used to improve linksโ€™ energy efficiency during low link activity. The burst-mode technique for VCSEL-based links has not yet been deployed commercially. Past works have presented burst-mode solutions for single-channel receivers, allowing lower power dissipation during low link activity and solutions for fast activation of the receivers. However, this work presents a novel technique that allows rapid activation of a front-end and fast locking of a clock-and-data-recovery (CDR) for a multi-channel parallel link, utilizing opportunities arising from the parallel nature of many VCSEL-based links. The idea has been demonstrated through electrical and optical measurements of a fabricated IC at 10 Gbps, which show fast data detection and activation of the circuitry within 49 UIs while allowing the front-end to achieve better energy efficiency during low link activity. Simulation results are also presented in support of the proposed technique which allows the CDR to lock within 26 UIs from when it is powered on

    Design of energy efficient high speed I/O interfaces

    Get PDF
    Energy efficiency has become a key performance metric for wireline high speed I/O interfaces. Consequently, design of low power I/O interfaces has garnered large interest that has mostly been focused on active power reduction techniques at peak data rate. In practice, most systems exhibit a wide range of data transfer patterns. As a result, low energy per bit operation at peak data rate does not necessarily translate to overall low energy operation. Therefore, I/O interfaces that can scale their power consumption with data rate requirement are desirable. Rapid on-off I/O interfaces have a potential to scale power with data rate requirements without severely affecting either latency or the throughput of the I/O interface. In this work, we explore circuit techniques for designing rapid on-off high speed wireline I/O interfaces and digital fractional-N PLLs. A burst-mode transmitter suitable for rapid on-off I/O interfaces is presented that achieves 6 ns turn-on time by utilizing a fast frequency settling ring oscillator in digital multiplying delay-locked loop and a rapid on-off biasing scheme for current mode output driver. Fabricated in 90 nm CMOS process, the prototype achieves 2.29 mW/Gb/s energy efficiency at peak data rate of 8 Gb/s. A 125X (8 Gb/s to 64 Mb/s) change in effective data rate results in 67X (18.29 mW to 0.27 mW) change in transmitter power consumption corresponding to only 2X (2.29 mW/Gb/s to 4.24 mW/Gb/s) degradation in energy efficiency for 32-byte long data bursts. We also present an analytical bit error rate (BER) computation technique for this transmitter under rapid on-off operation, which uses MDLL settling measurement data in conjunction with always-on transmitter measurements. This technique indicates that the BER bathtub width for 10^(โˆ’12) BER is 0.65 UI and 0.72 UI during rapid on-off operation and always-on operation, respectively. Next, a pulse response estimation-based technique is proposed enabling burst-mode operation for baud-rate sampling receivers that operate over high loss channels. Such receivers typically employ discrete time equalization to combat inter-symbol interference. Implementation details are provided for a receiver chip, fabricated in 65nm CMOS technology, that demonstrates efficacy of the proposed technique. A low complexity pulse response estimation technique is also presented for low power receivers that do not employ discrete time equalizers. We also present techniques for implementation of highly digital fractional-N PLL employing a phase interpolator based fractional divider to improve the quantization noise shaping properties of a 1-bit โˆ†ฮฃ frequency-to-digital converter. Fabricated in 65nm CMOS process, the prototype calibration-free fractional-N Type-II PLL employs the proposed frequency-to-digital converter in place of a high resolution time-to-digital converter and achieves 848 fs rms integrated jitter (1 kHz-30 MHz) and -101 dBc/Hz in-band phase noise while generating 5.054 GHz output from 31.25 MHz input
    • โ€ฆ
    corecore