13 research outputs found

    μ°¨μ„ΈλŒ€ HBM 용 고집적, μ €μ „λ ₯ μ†‘μˆ˜μ‹ κΈ° 섀계

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀, 2020. 8. 정덕균.This thesis presents design techniques for high-density power-efficient transceiver for the next-generation high bandwidth memory (HBM). Unlike the other memory interfaces, HBM uses a 3D-stacked package using through-silicon via (TSV) and a silicon interposer. The transceiver for HBM should be able to solve the problems caused by the 3D-stacked package and TSV. At first, a data (DQ) receiver for HBM with a self-tracking loop that tracks a phase skew between DQ and data strobe (DQS) due to a voltage or thermal drift is proposed. The self-tracking loop achieves low power and small area by uti-lizing an analog-assisted baud-rate phase detector. The proposed pulse-to-charge (PC) phase detector (PD) converts the phase skew to a voltage differ-ence and detects the phase skew from the voltage difference. An offset calibra-tion scheme that can compensates for a mismatch of the PD is also proposed. The proposed calibration scheme operates without any additional sensing cir-cuits by taking advantage of the write training of HBM. Fabricated in 65 nm CMOS, the DQ receiver shows a power efficiency of 370 fJ/b at 4.8 Gb/s and occupies 0.0056 mm2. The experimental results show that the DQ receiver op-erates without any performance degradation under a Β± 10% supply variation. In a second prototype IC, a high-density transceiver for HBM with a feed-forward-equalizer (FFE)-combined crosstalk (XT) cancellation scheme is pre-sented. To compensate for the XT, the transmitter pre-distorts the amplitude of the FFE output according to the XT. Since the proposed XT cancellation (XTC) scheme reuses the FFE implemented to equalize the channel loss, additional circuits for the XTC is minimized. Thanks to the XTC scheme, a channel pitch can be significantly reduced, allowing for the high channel density. Moreover, the 3D-staggered channel structure removes the ground layer between the verti-cally adjacent channels, which further reduces a cross-sectional area of the channel per lane. The test chip including 6 data lanes is fabricated in 65 nm CMOS technology. The 6-mm channels are implemented on chip to emulate the silicon interposer between the HBM and the processor. The operation of the XTC scheme is verified by simultaneously transmitting 4-Gb/s data to the 6 consecutive channels with 0.5-um pitch and the XTC scheme reduces the XT-induced jitter up to 78 %. The measurement result shows that the transceiver achieves the throughput of 8 Gb/s/um. The transceiver occupies 0.05 mm2 for 6 lanes and consumes 36.6 mW at 6 x 4 Gb/s.λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ°¨μ„ΈλŒ€ HBM을 μœ„ν•œ 고집적 μ €μ „λ ₯ μ†‘μˆ˜μ‹ κΈ° 섀계 방법을 μ œμ•ˆν•œλ‹€. 첫 번째둜, μ „μ•• 및 μ˜¨λ„ 변화에 μ˜ν•œ 데이터와 클럭 κ°„ μœ„μƒ 차이λ₯Ό 보상할 수 μžˆλŠ” 자체 좔적 루프λ₯Ό 가진 데이터 μˆ˜μ‹ κΈ°λ₯Ό μ œμ•ˆν•œλ‹€. μ œμ•ˆν•˜λŠ” 자체 좔적 λ£¨ν”„λŠ” 데이터 전솑 속도와 같은 μ†λ„λ‘œ λ™μž‘ν•˜λŠ” μœ„μƒ κ²€μΆœκΈ°λ₯Ό μ‚¬μš©ν•˜μ—¬ μ „λ ₯ μ†Œλͺ¨μ™€ 면적을 μ€„μ˜€λ‹€. λ˜ν•œ λ©”λͺ¨λ¦¬μ˜ μ“°κΈ° ν›ˆλ ¨ (write training) 과정을 μ΄μš©ν•˜μ—¬ 효과적으둜 μœ„μƒ κ²€μΆœκΈ°μ˜ μ˜€ν”„μ…‹μ„ 보상할 수 μžˆλŠ” 방법을 μ œμ•ˆν•œλ‹€. μ œμ•ˆν•˜λŠ” 데이터 μˆ˜μ‹ κΈ°λŠ” 65 nm κ³΅μ •μœΌλ‘œ μ œμž‘λ˜μ–΄ 4.8 Gb/sμ—μ„œ 370 fJ/b을 μ†Œλͺ¨ν•˜μ˜€λ‹€. λ˜ν•œ 10 % 의 μ „μ•• 변화에 λŒ€ν•˜μ—¬ μ•ˆμ •μ μœΌλ‘œ λ™μž‘ν•˜λŠ” 것을 ν™•μΈν•˜μ˜€λ‹€. 두 번째둜, ν”Όλ“œ ν¬μ›Œλ“œ 이퀄라이저와 κ²°ν•©λœ 크둜슀 토크 보상 방식을 ν™œμš©ν•œ 고집적 μ†‘μˆ˜μ‹ κΈ°λ₯Ό μ œμ•ˆν•œλ‹€. μ œμ•ˆν•˜λŠ” μ†‘μ‹ κΈ°λŠ” 크둜슀 토크 크기에 ν•΄λ‹Ήν•˜λŠ” 만큼 솑신기 좜λ ₯을 μ™œκ³‘ν•˜μ—¬ 크둜슀 토크λ₯Ό λ³΄μƒν•œλ‹€. μ œμ•ˆν•˜λŠ” 크둜슀 토크 보상 방식은 채널 손싀을 λ³΄μƒν•˜κΈ° μœ„ν•΄ κ΅¬ν˜„λœ ν”Όλ“œ ν¬μ›Œλ“œ 이퀄라이저λ₯Ό μž¬ν™œμš©ν•¨μœΌλ‘œμ¨ 좔가적인 회둜λ₯Ό μ΅œμ†Œν™”ν•œλ‹€. μ œμ•ˆν•˜λŠ” μ†‘μˆ˜μ‹ κΈ°λŠ” 크둜슀 토크가 보상 κ°€λŠ₯ν•˜κΈ° λ•Œλ¬Έμ—, 채널 간격을 크게 쀄여 고집적 톡신을 κ΅¬ν˜„ν•˜μ˜€λ‹€. λ˜ν•œ 집적도λ₯Ό 더 μ¦κ°€μ‹œν‚€κΈ° μœ„ν•΄ μ„Έλ‘œλ‘œ μΈμ ‘ν•œ 채널 μ‚¬μ΄μ˜ 차폐 측을 μ œκ±°ν•œ 적측 채널 ꡬ쑰λ₯Ό μ œμ•ˆν•œλ‹€. 6개의 μ†‘μˆ˜μ‹ κΈ°λ₯Ό ν¬ν•¨ν•œ ν”„λ‘œν† νƒ€μž… 칩은 65 nm κ³΅μ •μœΌλ‘œ μ œμž‘λ˜μ—ˆλ‹€. HBMκ³Ό ν”„λ‘œμ„Έμ„œ μ‚¬μ΄μ˜ silicon interposer channel 을 λͺ¨μ‚¬ν•˜κΈ° μœ„ν•œ 6 mm 의 채널이 μΉ© μœ„μ— κ΅¬ν˜„λ˜μ—ˆλ‹€. μ œμ•ˆν•˜λŠ” 크둜슀 토크 보상 방식은 0.5 um κ°„κ²©μ˜ 6개의 μΈμ ‘ν•œ 채널에 λ™μ‹œμ— 데이터λ₯Ό μ „μ†‘ν•˜μ—¬ κ²€μ¦λ˜μ—ˆμœΌλ©°, 크둜슀 ν† ν¬λ‘œ μΈν•œ 지터λ₯Ό μ΅œλŒ€ 78 % κ°μ†Œμ‹œμΌ°λ‹€. μ œμ•ˆν•˜λŠ” μ†‘μˆ˜μ‹ κΈ°λŠ” 8 Gb/s/um 의 μ²˜λ¦¬λŸ‰μ„ 가지며 6 개의 μ†‘μˆ˜μ‹ κΈ°κ°€ 총 36.6 mW의 μ „λ ₯을 μ†Œλͺ¨ν•˜μ˜€λ‹€.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 4 CHAPTER 2 BACKGROUND ON HIGH-BANDWIDTH MEMORY 6 2.1 OVERVIEW 6 2.2 TRANSCEIVER ARCHITECTURE 10 2.3 READ/WRITE OPERATION 15 2.3.1 READ OPERATION 15 2.3.2 WRITE OPERATION 19 CHAPTER 3 BACKGROUNDS ON COUPLED WIRES 21 3.1 GENERALIZED MODEL 21 3.2 EFFECT OF CROSSTALK 26 CHAPTER 4 DQ RECEIVER WITH BAUD-RATE SELF-TRACKING LOOP 29 4.1 OVERVIEW 29 4.2 FEATURES OF DQ RECEIVER FOR HBM 33 4.3 PROPOSED PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.1 OPERATION OF PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.2 OFFSET CALIBRATION 37 4.3.3 OPERATION SEQUENCE 39 4.4 CIRCUIT IMPLEMENTATION 42 4.5 MEASUREMENT RESULT 46 CHAPTER 5 HIGH-DENSITY TRANSCEIVER FOR HBM WITH 3D-STAGGERED CHANNEL AND CROSSTALK CANCELLATION SCHEME 57 5.1 OVERVIEW 57 5.2 PROPOSED 3D-STAGGERED CHANNEL 61 5.2.1 IMPLEMENTATION OF 3D-STAGGERED CHANNEL 61 5.2.2 CHANNEL CHARACTERISTICS AND MODELING 66 5.3 PROPOSED FEED-FORWARD-EQUALIZER-COMBINED CROSSTALK CANCELLATION SCHEME 72 5.4 CIRCUIT IMPLEMENTATION 77 5.4.1 OVERALL ARCHITECTURE 77 5.4.2 TRANSMITTER WITH FFE-COMBINED XTC 79 5.4.3 RECEIVER 81 5.5 MEASUREMENT RESULT 82 CHAPTER 6 CONCLUSION 93 BIBLIOGRAPHY 95 초 둝 102Docto

    Novel Multicarrier Memory Channel Architecture Using Microwave Interconnects: Alleviating the Memory Wall

    Get PDF
    abstract: The increase in computing power has simultaneously increased the demand for input/output (I/O) bandwidth. Unfortunately, the speed of I/O and memory interconnects have not kept pace. Thus, processor-based systems are I/O and interconnect limited. The memory aggregated bandwidth is not scaling fast enough to keep up with increasing bandwidth demands. The term "memory wall" has been coined to describe this phenomenon. A new memory bus concept that has the potential to push double data rate (DDR) memory speed to 30 Gbit/s is presented. We propose to map the conventional DDR bus to a microwave link using a multicarrier frequency division multiplexing scheme. The memory bus is formed using a microwave signal carried within a waveguide. We call this approach multicarrier memory channel architecture (MCMCA). In MCMCA, each memory signal is modulated onto an RF carrier using 64-QAM format or higher. The carriers are then routed using substrate integrated waveguide (SIW) interconnects. At the receiver, the memory signals are demodulated and then delivered to SDRAM devices. We pioneered the usage of SIW as memory channel interconnects and demonstrated that it alleviates the memory bandwidth bottleneck. We demonstrated SIW performance superiority over conventional transmission line in immunity to cross-talk and electromagnetic interference. We developed a methodology based on design of experiment (DOE) and response surface method techniques that optimizes the design of SIW interconnects and minimizes its performance fluctuations under material and manufacturing variations. Along with using SIW, we implemented a multicarrier architecture which enabled the aggregated DDR bandwidth to reach 30 Gbit/s. We developed an end-to-end system model in Simulink and demonstrated the MCMCA performance for ultra-high throughput memory channel. Experimental characterization of the new channel shows that by using judicious frequency division multiplexing, as few as one SIW interconnect is sufficient to transmit the 64 DDR bits. Overall aggregated bus data rate achieves 240 GBytes/s data transfer with EVM not exceeding 2.26% and phase error of 1.07 degree or less.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Hybrid NRZ/Multi-Tone Signaling for High-Speed Low-Power Wireline Transceivers

    Get PDF
    Over the past few decades, incessant growth of Internet networking traffic and High-Performance Computing (HPC) has led to a tremendous demand for data bandwidth. Digital communication technologies combined with advanced integrated circuit scaling trends have enabled the semiconductor and microelectronic industry to dramatically scale the bandwidth of high-loss interfaces such as Ethernet, backplane, and Digital Subscriber Line (DSL). The key to achieving higher bandwidth is to employ equalization technique to compensate the channel impairments such as Inter-Symbol Interference (ISI), crosstalk, and environmental noise. Therefore, todayΓ’s advanced input/outputs (I/Os) has been equipped with sophisticated equalization techniques to push beyond the uncompensated bandwidth of the system. To this end, process scaling has continually increased the data processing capability and improved the I/O performance over the last 15 years. However, since the channel bandwidth has not scaled with the same pace, the required signal processing and equalization circuitry becomes more and more complicated. Thereby, the energy efficiency improvements are largely offset by the energy needed to compensate channel impairments. In this design paradigm, re-thinking about the design strategies in order to not only satisfy the bandwidth performance, but also to improve power-performance becomes an important necessity. It is well known in communication theory that coding and signaling schemes have the potential to provide superior performance over band-limited channels. However, the choice of the optimum data communication algorithm should be considered by accounting for the circuit level power-performance trade-offs. In this thesis we have investigated the application of new algorithm and signaling schemes in wireline communications, especially for communication between microprocessors, memories, and peripherals. A new hybrid NRZ/Multi-Tone (NRZ/MT) signaling method has been developed during the course of this research. The system-level and circuit-level analysis, design, and implementation of the proposed signaling method has been performed in the frame of this work, and the silicon measurement results have proved the efficiency and the robustness of the proposed signaling methodology for wireline interfaces. In the first part of this work, a 7.5 Gb/s hybrid NRZ/MT transceiver (TRX) for multi-drop bus (MDB) memory interfaces is designed and fabricated in 40 nm CMOS technology. Reducing the complexity of the equalization circuitry on the receiver (RX) side, the proposed architecture achieves 1 pJ/bit link efficiency for a MDB channel bearing 45 dB loss at 2.5 GHz. The measurement results of the first prototype confirm that NRZ/MT serial data TRX can offer an energy-efficient solution for MDB memory interfaces. Motivated by the satisfying results of the first prototype, in the second phase of this research we have exploited the properties of multi-tone signaling, especially orthogonality among different sub-bands, to reduce the effect of crosstalk in high-dense wireline interconnects. A four-channel transceiver has been implemented in a standard CMOS 40 nm technology in order to demonstrate the performance of NRZ/MT signaling in presence of high channel loss and strong crosstalk noise. The proposed system achieves 1 pJ/bit power efficiency, while communicating over a MDB memory channel at 36 Gb/s aggregate data rate

    Toward realizing power scalable and energy proportional high-speed wireline links

    Get PDF
    Growing computational demand and proliferation of cloud computing has placed high-speed serial links at the center stage. Due to saturating energy efficiency improvements over the last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as output drivers, receiver, or clock generation and distribution. However, this approach yields very limited efficiency improvement. This dissertation takes an alternative approach toward reducing the serial link power. Instead of optimizing the power of individual building blocks, power of the entire serial link is reduced by exploiting serial link usage by the applications. It has been demonstrated that serial links in servers are underutilized. On average, they are used only 15% of the time, i.e. these links are idle for approximately 85% of the time. Conventional links consume power during idle periods to maintain synchronization between the transmitter and the receiver. However, by powering-off the link when idle and powering it back when needed, power consumption of the serial link can be scaled proportionally to its utilization. This approach of rapid power state transitioning is known as the rapid-on/off approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power, and power state transition energy must all be close to zero. However, in practice, it is very difficult to achieve these ideal conditions. Work presented in this dissertation addresses these challenges. When this research work was started (2011-12), there were only a couple of research papers available in the area of rapid-on/off links. Systematic study or design of a rapid power state transitioning in serial links was not available in the literature. Since rapid-on/off with nanoseconds granularity is not a standard in any wireline communication, even the popular test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However, these challenges provided a unique opportunity to explore new architectural techniques and identify trade-offs. The key contributions of this dissertation are as follows. The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to find alternative ways to reduce the serial link power. The second contribution is to identify potential power saving techniques and evaluate the challenges they pose and the opportunities they present. The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature. The transmitter achieves rapid-on/off capability in voltage mode output driver by using a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and periodic reference insertion. To ease timing requirements, an improved edge replacement logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time as a function of various circuit parameters is also discussed. The proposed transmitter demonstrates energy proportional operation over wide variations of link utilization, and is, therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns, respectively. This dissertation highlights key trade-off in the clock multiplier architecture, to achieve fast power-on-lock capability at the cost of jitter performance. The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi- plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita- tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves power-on-lock in 1ns. The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit- ter and receiver. It was the first reported design of a complete transceiver, with an embedded clock architecture, having rapid-on/off capability. Background phase calibration technique in PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on. The proposed transceiver demonstrates power scalability with a wide range of link utiliza- tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes by 100x (7Gb/s-to-70Mb/s). The sixth and final contribution is the design of a temperature sensor to compensate the frequency drifts due to temperature variations, during long power-off periods, in the fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor is designed with all digital logic gates and achieves low supply sensitivity. This sensor is suitable for integration in processor and DRAM environments. The proposed sensor works on the principle of directly converting temperature information to frequency and finally to digital bits. A novel sensing technique is proposed in which temperature information is acquired by creating a threshold voltage difference between the transistors used in the oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and the overhead of voltage regulators and an external ideal reference frequency is avoided. The effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of Β±0.9oC and Β±2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 , and measurement (conversion) time of 6.5ΞΌs

    Design Techniques for High Performance Serial Link Transceivers

    Get PDF
    Increasing data rates over electrical channels with significant frequency-dependent loss is difficult due to excessive inter-symbol interference (ISI). In order to achieve sufficient link margins at high rates, I/O system designers implement equalization in the transmitters and are motivated to consider more spectrally-efficient modulation formats relative to the common PAM-2 scheme, such as PAM-4 and duobinary. The first work, reviews when to consider PAM-4 and duobinary formats, as the modulation scheme which yields the highest system margins at a given data rate is a function of the channel loss profile, and presents a 20Gb/s triple-mode transmitter capable of efficiently implementing these three modulation schemes and three-tap feedforward equalization. A statistical link modeling tool, which models ISI, crosstalk, random noise, and timing jitter, is developed to compare the three common modulation formats operating on electrical backplane channel models. In order to improve duobinary modulation efficiency, a low-power quarter-rate duobinary precoder circuit is proposed which provides significant timing margin improvement relative to full-rate precoders. Also as serial I/O data rates scale above 10 Gb/s, crosstalk between neighboring channels degrades system bit-error rate (BER) performance. The next work presents receive-side circuitry which merges the cancellation of both near-end and far-end crosstalk (NEXT/FEXT) and can automatically adapt to different channel environments and variations in process, voltage, and temperature. NEXT cancellation is realized with a novel 3-tap FIR filter which combines two traditional FIR filter taps and a continuous-time band-pass filter IIR tap for efficient crosstalk cancellation, with all filter tap coefficients automatically determined via an ondie sign-sign least-mean-square (SS-LMS) adaptation engine. FEXT cancellation is realized by coupling the aggressor signal through a differentiator circuit whose gain is automatically adjusted with a power-detection-based adaptation loop. In conclusion, the proposed architectures in the transmitter side and receiver side together are to be good solution in the high speed I/O serial links to improve the performance by overcome the physical channel loss and adjacent channel noise as the system becomes complicated

    Solid State Circuits Technologies

    Get PDF
    The evolution of solid-state circuit technology has a long history within a relatively short period of time. This technology has lead to the modern information society that connects us and tools, a large market, and many types of products and applications. The solid-state circuit technology continuously evolves via breakthroughs and improvements every year. This book is devoted to review and present novel approaches for some of the main issues involved in this exciting and vigorous technology. The book is composed of 22 chapters, written by authors coming from 30 different institutions located in 12 different countries throughout the Americas, Asia and Europe. Thus, reflecting the wide international contribution to the book. The broad range of subjects presented in the book offers a general overview of the main issues in modern solid-state circuit technology. Furthermore, the book offers an in depth analysis on specific subjects for specialists. We believe the book is of great scientific and educational value for many readers. I am profoundly indebted to the support provided by all of those involved in the work. First and foremost I would like to acknowledge and thank the authors who worked hard and generously agreed to share their results and knowledge. Second I would like to express my gratitude to the Intech team that invited me to edit the book and give me their full support and a fruitful experience while working together to combine this book
    corecore