3 research outputs found

    μ°¨μ„ΈλŒ€ HBM 용 고집적, μ €μ „λ ₯ μ†‘μˆ˜μ‹ κΈ° 섀계

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀, 2020. 8. 정덕균.This thesis presents design techniques for high-density power-efficient transceiver for the next-generation high bandwidth memory (HBM). Unlike the other memory interfaces, HBM uses a 3D-stacked package using through-silicon via (TSV) and a silicon interposer. The transceiver for HBM should be able to solve the problems caused by the 3D-stacked package and TSV. At first, a data (DQ) receiver for HBM with a self-tracking loop that tracks a phase skew between DQ and data strobe (DQS) due to a voltage or thermal drift is proposed. The self-tracking loop achieves low power and small area by uti-lizing an analog-assisted baud-rate phase detector. The proposed pulse-to-charge (PC) phase detector (PD) converts the phase skew to a voltage differ-ence and detects the phase skew from the voltage difference. An offset calibra-tion scheme that can compensates for a mismatch of the PD is also proposed. The proposed calibration scheme operates without any additional sensing cir-cuits by taking advantage of the write training of HBM. Fabricated in 65 nm CMOS, the DQ receiver shows a power efficiency of 370 fJ/b at 4.8 Gb/s and occupies 0.0056 mm2. The experimental results show that the DQ receiver op-erates without any performance degradation under a Β± 10% supply variation. In a second prototype IC, a high-density transceiver for HBM with a feed-forward-equalizer (FFE)-combined crosstalk (XT) cancellation scheme is pre-sented. To compensate for the XT, the transmitter pre-distorts the amplitude of the FFE output according to the XT. Since the proposed XT cancellation (XTC) scheme reuses the FFE implemented to equalize the channel loss, additional circuits for the XTC is minimized. Thanks to the XTC scheme, a channel pitch can be significantly reduced, allowing for the high channel density. Moreover, the 3D-staggered channel structure removes the ground layer between the verti-cally adjacent channels, which further reduces a cross-sectional area of the channel per lane. The test chip including 6 data lanes is fabricated in 65 nm CMOS technology. The 6-mm channels are implemented on chip to emulate the silicon interposer between the HBM and the processor. The operation of the XTC scheme is verified by simultaneously transmitting 4-Gb/s data to the 6 consecutive channels with 0.5-um pitch and the XTC scheme reduces the XT-induced jitter up to 78 %. The measurement result shows that the transceiver achieves the throughput of 8 Gb/s/um. The transceiver occupies 0.05 mm2 for 6 lanes and consumes 36.6 mW at 6 x 4 Gb/s.λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ°¨μ„ΈλŒ€ HBM을 μœ„ν•œ 고집적 μ €μ „λ ₯ μ†‘μˆ˜μ‹ κΈ° 섀계 방법을 μ œμ•ˆν•œλ‹€. 첫 번째둜, μ „μ•• 및 μ˜¨λ„ 변화에 μ˜ν•œ 데이터와 클럭 κ°„ μœ„μƒ 차이λ₯Ό 보상할 수 μžˆλŠ” 자체 좔적 루프λ₯Ό 가진 데이터 μˆ˜μ‹ κΈ°λ₯Ό μ œμ•ˆν•œλ‹€. μ œμ•ˆν•˜λŠ” 자체 좔적 λ£¨ν”„λŠ” 데이터 전솑 속도와 같은 μ†λ„λ‘œ λ™μž‘ν•˜λŠ” μœ„μƒ κ²€μΆœκΈ°λ₯Ό μ‚¬μš©ν•˜μ—¬ μ „λ ₯ μ†Œλͺ¨μ™€ 면적을 μ€„μ˜€λ‹€. λ˜ν•œ λ©”λͺ¨λ¦¬μ˜ μ“°κΈ° ν›ˆλ ¨ (write training) 과정을 μ΄μš©ν•˜μ—¬ 효과적으둜 μœ„μƒ κ²€μΆœκΈ°μ˜ μ˜€ν”„μ…‹μ„ 보상할 수 μžˆλŠ” 방법을 μ œμ•ˆν•œλ‹€. μ œμ•ˆν•˜λŠ” 데이터 μˆ˜μ‹ κΈ°λŠ” 65 nm κ³΅μ •μœΌλ‘œ μ œμž‘λ˜μ–΄ 4.8 Gb/sμ—μ„œ 370 fJ/b을 μ†Œλͺ¨ν•˜μ˜€λ‹€. λ˜ν•œ 10 % 의 μ „μ•• 변화에 λŒ€ν•˜μ—¬ μ•ˆμ •μ μœΌλ‘œ λ™μž‘ν•˜λŠ” 것을 ν™•μΈν•˜μ˜€λ‹€. 두 번째둜, ν”Όλ“œ ν¬μ›Œλ“œ 이퀄라이저와 κ²°ν•©λœ 크둜슀 토크 보상 방식을 ν™œμš©ν•œ 고집적 μ†‘μˆ˜μ‹ κΈ°λ₯Ό μ œμ•ˆν•œλ‹€. μ œμ•ˆν•˜λŠ” μ†‘μ‹ κΈ°λŠ” 크둜슀 토크 크기에 ν•΄λ‹Ήν•˜λŠ” 만큼 솑신기 좜λ ₯을 μ™œκ³‘ν•˜μ—¬ 크둜슀 토크λ₯Ό λ³΄μƒν•œλ‹€. μ œμ•ˆν•˜λŠ” 크둜슀 토크 보상 방식은 채널 손싀을 λ³΄μƒν•˜κΈ° μœ„ν•΄ κ΅¬ν˜„λœ ν”Όλ“œ ν¬μ›Œλ“œ 이퀄라이저λ₯Ό μž¬ν™œμš©ν•¨μœΌλ‘œμ¨ 좔가적인 회둜λ₯Ό μ΅œμ†Œν™”ν•œλ‹€. μ œμ•ˆν•˜λŠ” μ†‘μˆ˜μ‹ κΈ°λŠ” 크둜슀 토크가 보상 κ°€λŠ₯ν•˜κΈ° λ•Œλ¬Έμ—, 채널 간격을 크게 쀄여 고집적 톡신을 κ΅¬ν˜„ν•˜μ˜€λ‹€. λ˜ν•œ 집적도λ₯Ό 더 μ¦κ°€μ‹œν‚€κΈ° μœ„ν•΄ μ„Έλ‘œλ‘œ μΈμ ‘ν•œ 채널 μ‚¬μ΄μ˜ 차폐 측을 μ œκ±°ν•œ 적측 채널 ꡬ쑰λ₯Ό μ œμ•ˆν•œλ‹€. 6개의 μ†‘μˆ˜μ‹ κΈ°λ₯Ό ν¬ν•¨ν•œ ν”„λ‘œν† νƒ€μž… 칩은 65 nm κ³΅μ •μœΌλ‘œ μ œμž‘λ˜μ—ˆλ‹€. HBMκ³Ό ν”„λ‘œμ„Έμ„œ μ‚¬μ΄μ˜ silicon interposer channel 을 λͺ¨μ‚¬ν•˜κΈ° μœ„ν•œ 6 mm 의 채널이 μΉ© μœ„μ— κ΅¬ν˜„λ˜μ—ˆλ‹€. μ œμ•ˆν•˜λŠ” 크둜슀 토크 보상 방식은 0.5 um κ°„κ²©μ˜ 6개의 μΈμ ‘ν•œ 채널에 λ™μ‹œμ— 데이터λ₯Ό μ „μ†‘ν•˜μ—¬ κ²€μ¦λ˜μ—ˆμœΌλ©°, 크둜슀 ν† ν¬λ‘œ μΈν•œ 지터λ₯Ό μ΅œλŒ€ 78 % κ°μ†Œμ‹œμΌ°λ‹€. μ œμ•ˆν•˜λŠ” μ†‘μˆ˜μ‹ κΈ°λŠ” 8 Gb/s/um 의 μ²˜λ¦¬λŸ‰μ„ 가지며 6 개의 μ†‘μˆ˜μ‹ κΈ°κ°€ 총 36.6 mW의 μ „λ ₯을 μ†Œλͺ¨ν•˜μ˜€λ‹€.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 4 CHAPTER 2 BACKGROUND ON HIGH-BANDWIDTH MEMORY 6 2.1 OVERVIEW 6 2.2 TRANSCEIVER ARCHITECTURE 10 2.3 READ/WRITE OPERATION 15 2.3.1 READ OPERATION 15 2.3.2 WRITE OPERATION 19 CHAPTER 3 BACKGROUNDS ON COUPLED WIRES 21 3.1 GENERALIZED MODEL 21 3.2 EFFECT OF CROSSTALK 26 CHAPTER 4 DQ RECEIVER WITH BAUD-RATE SELF-TRACKING LOOP 29 4.1 OVERVIEW 29 4.2 FEATURES OF DQ RECEIVER FOR HBM 33 4.3 PROPOSED PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.1 OPERATION OF PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.2 OFFSET CALIBRATION 37 4.3.3 OPERATION SEQUENCE 39 4.4 CIRCUIT IMPLEMENTATION 42 4.5 MEASUREMENT RESULT 46 CHAPTER 5 HIGH-DENSITY TRANSCEIVER FOR HBM WITH 3D-STAGGERED CHANNEL AND CROSSTALK CANCELLATION SCHEME 57 5.1 OVERVIEW 57 5.2 PROPOSED 3D-STAGGERED CHANNEL 61 5.2.1 IMPLEMENTATION OF 3D-STAGGERED CHANNEL 61 5.2.2 CHANNEL CHARACTERISTICS AND MODELING 66 5.3 PROPOSED FEED-FORWARD-EQUALIZER-COMBINED CROSSTALK CANCELLATION SCHEME 72 5.4 CIRCUIT IMPLEMENTATION 77 5.4.1 OVERALL ARCHITECTURE 77 5.4.2 TRANSMITTER WITH FFE-COMBINED XTC 79 5.4.3 RECEIVER 81 5.5 MEASUREMENT RESULT 82 CHAPTER 6 CONCLUSION 93 BIBLIOGRAPHY 95 초 둝 102Docto

    Low-power and high-fanout bus design techniques

    Get PDF
    Low-power techniques pose an important concern, when designing autonomous electronic devices. Most of the upcoming applications increasingly demand high performance and low-power consumption. In this thesis work, two low-power and high-fanout bus design techniques are reviewed. Pulse Width Modulation (PWM) and Time-Domain Conversion (TDC) approaches are elucidated. Schematic simulations (Cadence), quantitative and comparative results of both approaches are included. Additionally, on-chip wire theory is shown as well as some optimized bus simulation models (MATLAB), concluding with a summary of the main application areas for this techniques. Finally , two ready-to-use library cells are generated, as well as Verilog code for the TDC system

    An Energy-Efficient Reconfigurable Mobile Memory Interface for Computing Systems

    Get PDF
    The critical need for higher power efficiency and bandwidth transceiver design has significantly increased as mobile devices, such as smart phones, laptops, tablets, and ultra-portable personal digital assistants continue to be constructed using heterogeneous intellectual properties such as central processing units (CPUs), graphics processing units (GPUs), digital signal processors, dynamic random-access memories (DRAMs), sensors, and graphics/image processing units and to have enhanced graphic computing and video processing capabilities. However, the current mobile interface technologies which support CPU to memory communication (e.g. baseband-only signaling) have critical limitations, particularly super-linear energy consumption, limited bandwidth, and non-reconfigurable data access. As a consequence, there is a critical need to improve both energy efficiency and bandwidth for future mobile devices.;The primary goal of this study is to design an energy-efficient reconfigurable mobile memory interface for mobile computing systems in order to dramatically enhance the circuit and system bandwidth and power efficiency. The proposed energy efficient mobile memory interface which utilizes an advanced base-band (BB) signaling and a RF-band signaling is capable of simultaneous bi-directional communication and reconfigurable data access. It also increases power efficiency and bandwidth between mobile CPUs and memory subsystems on a single-ended shared transmission line. Moreover, due to multiple data communication on a single-ended shared transmission line, the number of transmission lines between mobile CPU and memories is considerably reduced, resulting in significant technological innovations, (e.g. more compact devices and low cost packaging to mobile communication interface) and establishing the principles and feasibility of technologies for future mobile system applications. The operation and performance of the proposed transceiver are analyzed and its circuit implementation is discussed in details. A chip prototype of the transceiver was implemented in a 65nm CMOS process technology. In the measurement, the transceiver exhibits higher aggregate data throughput and better energy efficiency compared to prior works
    corecore