5 research outputs found

    An Eight lanes 7Gb/s/pin Source Synchronous Single-Ended RX with Equalization and Far-End Crosstalk Cancellation for Backplane Channels

    Get PDF
    This paper presents a versatile crosstalk cancellation scheme for single-ended multi-lane backplane links. System-level investigations show that a scheme, which combines analog filters and decision-feedback crosstalk compensation on the receiver (RX) side only, can efficiently remove crosstalk patterns in straight channels as well as boards with reflections due to via stubs. An eight-lane single-ended RX has been manufactured in 32-nm SOI CMOS to validate our findings. A CTLE and eight-tap decision feedback equalizer equalize the channel without transmitter feedforward equalizer. A continuous time crosstalk canceller reduces precursors by nearest neighbors, while the residual postcursors from all aggressors are suppressed by direct feedback 7x8-tap decision-feedback crosstalk canceller (DFXC). Measurements with flip-chip packaged RX show that the RX macro can equalize both a 30-dB insertion loss single-ended channel with 0-dB signal-to-crosstalk at Nyquist and a channel with 28-dB attenuation with the signal-to-crosstalk ratio of 6 dB combined with reflections due to via stubs. The RX operates up to 7 Gb/s/pin with PRBS11 data at bit error rate (BER) <10⁻¹², and occupies 300x350 μm² with an energy efficiency of 5.9 mW/Gb/s from 1-V supply

    A 5 Gb/s Single-Ended Parallel Receiver With Adaptive Crosstalk-Induced Jitter Cancellation

    No full text
    This paper presents an adaptive far-end crosstalk cancellation scheme for a single-ended parallel receiver. The adaptation engine is embedded in a single representative channel CDR, and the receiver efficiently reduces the crosstalk noise with a minimal cost in hardware and power consumption. In addition, the proposed scheme can be applied to any given CDR and equalizing circuits. The receiver is fabricated in 0.13 mu m CMOS technology and achieves a reduction of FEXT-induced jitter up to 75%. The receiver consumes 65 mW at 5 Gb/s (4.3 mW/Gb/s/pin) including a PLL for global clock distribution.X1198sciescopu

    μ°¨μ„ΈλŒ€ HBM 용 고집적, μ €μ „λ ₯ μ†‘μˆ˜μ‹ κΈ° 섀계

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀, 2020. 8. 정덕균.This thesis presents design techniques for high-density power-efficient transceiver for the next-generation high bandwidth memory (HBM). Unlike the other memory interfaces, HBM uses a 3D-stacked package using through-silicon via (TSV) and a silicon interposer. The transceiver for HBM should be able to solve the problems caused by the 3D-stacked package and TSV. At first, a data (DQ) receiver for HBM with a self-tracking loop that tracks a phase skew between DQ and data strobe (DQS) due to a voltage or thermal drift is proposed. The self-tracking loop achieves low power and small area by uti-lizing an analog-assisted baud-rate phase detector. The proposed pulse-to-charge (PC) phase detector (PD) converts the phase skew to a voltage differ-ence and detects the phase skew from the voltage difference. An offset calibra-tion scheme that can compensates for a mismatch of the PD is also proposed. The proposed calibration scheme operates without any additional sensing cir-cuits by taking advantage of the write training of HBM. Fabricated in 65 nm CMOS, the DQ receiver shows a power efficiency of 370 fJ/b at 4.8 Gb/s and occupies 0.0056 mm2. The experimental results show that the DQ receiver op-erates without any performance degradation under a Β± 10% supply variation. In a second prototype IC, a high-density transceiver for HBM with a feed-forward-equalizer (FFE)-combined crosstalk (XT) cancellation scheme is pre-sented. To compensate for the XT, the transmitter pre-distorts the amplitude of the FFE output according to the XT. Since the proposed XT cancellation (XTC) scheme reuses the FFE implemented to equalize the channel loss, additional circuits for the XTC is minimized. Thanks to the XTC scheme, a channel pitch can be significantly reduced, allowing for the high channel density. Moreover, the 3D-staggered channel structure removes the ground layer between the verti-cally adjacent channels, which further reduces a cross-sectional area of the channel per lane. The test chip including 6 data lanes is fabricated in 65 nm CMOS technology. The 6-mm channels are implemented on chip to emulate the silicon interposer between the HBM and the processor. The operation of the XTC scheme is verified by simultaneously transmitting 4-Gb/s data to the 6 consecutive channels with 0.5-um pitch and the XTC scheme reduces the XT-induced jitter up to 78 %. The measurement result shows that the transceiver achieves the throughput of 8 Gb/s/um. The transceiver occupies 0.05 mm2 for 6 lanes and consumes 36.6 mW at 6 x 4 Gb/s.λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ°¨μ„ΈλŒ€ HBM을 μœ„ν•œ 고집적 μ €μ „λ ₯ μ†‘μˆ˜μ‹ κΈ° 섀계 방법을 μ œμ•ˆν•œλ‹€. 첫 번째둜, μ „μ•• 및 μ˜¨λ„ 변화에 μ˜ν•œ 데이터와 클럭 κ°„ μœ„μƒ 차이λ₯Ό 보상할 수 μžˆλŠ” 자체 좔적 루프λ₯Ό 가진 데이터 μˆ˜μ‹ κΈ°λ₯Ό μ œμ•ˆν•œλ‹€. μ œμ•ˆν•˜λŠ” 자체 좔적 λ£¨ν”„λŠ” 데이터 전솑 속도와 같은 μ†λ„λ‘œ λ™μž‘ν•˜λŠ” μœ„μƒ κ²€μΆœκΈ°λ₯Ό μ‚¬μš©ν•˜μ—¬ μ „λ ₯ μ†Œλͺ¨μ™€ 면적을 μ€„μ˜€λ‹€. λ˜ν•œ λ©”λͺ¨λ¦¬μ˜ μ“°κΈ° ν›ˆλ ¨ (write training) 과정을 μ΄μš©ν•˜μ—¬ 효과적으둜 μœ„μƒ κ²€μΆœκΈ°μ˜ μ˜€ν”„μ…‹μ„ 보상할 수 μžˆλŠ” 방법을 μ œμ•ˆν•œλ‹€. μ œμ•ˆν•˜λŠ” 데이터 μˆ˜μ‹ κΈ°λŠ” 65 nm κ³΅μ •μœΌλ‘œ μ œμž‘λ˜μ–΄ 4.8 Gb/sμ—μ„œ 370 fJ/b을 μ†Œλͺ¨ν•˜μ˜€λ‹€. λ˜ν•œ 10 % 의 μ „μ•• 변화에 λŒ€ν•˜μ—¬ μ•ˆμ •μ μœΌλ‘œ λ™μž‘ν•˜λŠ” 것을 ν™•μΈν•˜μ˜€λ‹€. 두 번째둜, ν”Όλ“œ ν¬μ›Œλ“œ 이퀄라이저와 κ²°ν•©λœ 크둜슀 토크 보상 방식을 ν™œμš©ν•œ 고집적 μ†‘μˆ˜μ‹ κΈ°λ₯Ό μ œμ•ˆν•œλ‹€. μ œμ•ˆν•˜λŠ” μ†‘μ‹ κΈ°λŠ” 크둜슀 토크 크기에 ν•΄λ‹Ήν•˜λŠ” 만큼 솑신기 좜λ ₯을 μ™œκ³‘ν•˜μ—¬ 크둜슀 토크λ₯Ό λ³΄μƒν•œλ‹€. μ œμ•ˆν•˜λŠ” 크둜슀 토크 보상 방식은 채널 손싀을 λ³΄μƒν•˜κΈ° μœ„ν•΄ κ΅¬ν˜„λœ ν”Όλ“œ ν¬μ›Œλ“œ 이퀄라이저λ₯Ό μž¬ν™œμš©ν•¨μœΌλ‘œμ¨ 좔가적인 회둜λ₯Ό μ΅œμ†Œν™”ν•œλ‹€. μ œμ•ˆν•˜λŠ” μ†‘μˆ˜μ‹ κΈ°λŠ” 크둜슀 토크가 보상 κ°€λŠ₯ν•˜κΈ° λ•Œλ¬Έμ—, 채널 간격을 크게 쀄여 고집적 톡신을 κ΅¬ν˜„ν•˜μ˜€λ‹€. λ˜ν•œ 집적도λ₯Ό 더 μ¦κ°€μ‹œν‚€κΈ° μœ„ν•΄ μ„Έλ‘œλ‘œ μΈμ ‘ν•œ 채널 μ‚¬μ΄μ˜ 차폐 측을 μ œκ±°ν•œ 적측 채널 ꡬ쑰λ₯Ό μ œμ•ˆν•œλ‹€. 6개의 μ†‘μˆ˜μ‹ κΈ°λ₯Ό ν¬ν•¨ν•œ ν”„λ‘œν† νƒ€μž… 칩은 65 nm κ³΅μ •μœΌλ‘œ μ œμž‘λ˜μ—ˆλ‹€. HBMκ³Ό ν”„λ‘œμ„Έμ„œ μ‚¬μ΄μ˜ silicon interposer channel 을 λͺ¨μ‚¬ν•˜κΈ° μœ„ν•œ 6 mm 의 채널이 μΉ© μœ„μ— κ΅¬ν˜„λ˜μ—ˆλ‹€. μ œμ•ˆν•˜λŠ” 크둜슀 토크 보상 방식은 0.5 um κ°„κ²©μ˜ 6개의 μΈμ ‘ν•œ 채널에 λ™μ‹œμ— 데이터λ₯Ό μ „μ†‘ν•˜μ—¬ κ²€μ¦λ˜μ—ˆμœΌλ©°, 크둜슀 ν† ν¬λ‘œ μΈν•œ 지터λ₯Ό μ΅œλŒ€ 78 % κ°μ†Œμ‹œμΌ°λ‹€. μ œμ•ˆν•˜λŠ” μ†‘μˆ˜μ‹ κΈ°λŠ” 8 Gb/s/um 의 μ²˜λ¦¬λŸ‰μ„ 가지며 6 개의 μ†‘μˆ˜μ‹ κΈ°κ°€ 총 36.6 mW의 μ „λ ₯을 μ†Œλͺ¨ν•˜μ˜€λ‹€.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 4 CHAPTER 2 BACKGROUND ON HIGH-BANDWIDTH MEMORY 6 2.1 OVERVIEW 6 2.2 TRANSCEIVER ARCHITECTURE 10 2.3 READ/WRITE OPERATION 15 2.3.1 READ OPERATION 15 2.3.2 WRITE OPERATION 19 CHAPTER 3 BACKGROUNDS ON COUPLED WIRES 21 3.1 GENERALIZED MODEL 21 3.2 EFFECT OF CROSSTALK 26 CHAPTER 4 DQ RECEIVER WITH BAUD-RATE SELF-TRACKING LOOP 29 4.1 OVERVIEW 29 4.2 FEATURES OF DQ RECEIVER FOR HBM 33 4.3 PROPOSED PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.1 OPERATION OF PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.2 OFFSET CALIBRATION 37 4.3.3 OPERATION SEQUENCE 39 4.4 CIRCUIT IMPLEMENTATION 42 4.5 MEASUREMENT RESULT 46 CHAPTER 5 HIGH-DENSITY TRANSCEIVER FOR HBM WITH 3D-STAGGERED CHANNEL AND CROSSTALK CANCELLATION SCHEME 57 5.1 OVERVIEW 57 5.2 PROPOSED 3D-STAGGERED CHANNEL 61 5.2.1 IMPLEMENTATION OF 3D-STAGGERED CHANNEL 61 5.2.2 CHANNEL CHARACTERISTICS AND MODELING 66 5.3 PROPOSED FEED-FORWARD-EQUALIZER-COMBINED CROSSTALK CANCELLATION SCHEME 72 5.4 CIRCUIT IMPLEMENTATION 77 5.4.1 OVERALL ARCHITECTURE 77 5.4.2 TRANSMITTER WITH FFE-COMBINED XTC 79 5.4.3 RECEIVER 81 5.5 MEASUREMENT RESULT 82 CHAPTER 6 CONCLUSION 93 BIBLIOGRAPHY 95 초 둝 102Docto

    Design Techniques for High Performance Serial Link Transceivers

    Get PDF
    Increasing data rates over electrical channels with significant frequency-dependent loss is difficult due to excessive inter-symbol interference (ISI). In order to achieve sufficient link margins at high rates, I/O system designers implement equalization in the transmitters and are motivated to consider more spectrally-efficient modulation formats relative to the common PAM-2 scheme, such as PAM-4 and duobinary. The first work, reviews when to consider PAM-4 and duobinary formats, as the modulation scheme which yields the highest system margins at a given data rate is a function of the channel loss profile, and presents a 20Gb/s triple-mode transmitter capable of efficiently implementing these three modulation schemes and three-tap feedforward equalization. A statistical link modeling tool, which models ISI, crosstalk, random noise, and timing jitter, is developed to compare the three common modulation formats operating on electrical backplane channel models. In order to improve duobinary modulation efficiency, a low-power quarter-rate duobinary precoder circuit is proposed which provides significant timing margin improvement relative to full-rate precoders. Also as serial I/O data rates scale above 10 Gb/s, crosstalk between neighboring channels degrades system bit-error rate (BER) performance. The next work presents receive-side circuitry which merges the cancellation of both near-end and far-end crosstalk (NEXT/FEXT) and can automatically adapt to different channel environments and variations in process, voltage, and temperature. NEXT cancellation is realized with a novel 3-tap FIR filter which combines two traditional FIR filter taps and a continuous-time band-pass filter IIR tap for efficient crosstalk cancellation, with all filter tap coefficients automatically determined via an ondie sign-sign least-mean-square (SS-LMS) adaptation engine. FEXT cancellation is realized by coupling the aggressor signal through a differentiator circuit whose gain is automatically adjusted with a power-detection-based adaptation loop. In conclusion, the proposed architectures in the transmitter side and receiver side together are to be good solution in the high speed I/O serial links to improve the performance by overcome the physical channel loss and adjacent channel noise as the system becomes complicated

    Learning-Based Hardware Design for Data Acquisition Systems

    Get PDF
    This multidisciplinary research work aims to investigate the optimized information extraction from signals or data volumes and to develop tailored hardware implementations that trade-off the complexity of data acquisition with that of data processing, conceptually allowing radically new device designs. The mathematical results in classical Compressive Sampling (CS) support the paradigm of Analog-to-Information Conversion (AIC) as a replacement for conventional ADC technologies. The AICs simultaneously perform data acquisition and compression, seeking to directly sample signals for achieving specific tasks as opposed to acquiring a full signal only at the Nyquist rate to throw most of it away via compression. Our contention is that in order for CS to live up its name, both theory and practice must leverage concepts from learning. This work demonstrates our contention in hardware prototypes, with key trade-offs, for two different fields of application as edge and big-data computing. In the framework of edge-data computing, such as wearable and implantable ecosystems, the power budget is defined by the battery capacity, which generally limits the device performance and usability. This is more evident in very challenging field, such as medical monitoring, where high performance requirements are necessary for the device to process the information with high accuracy. Furthermore, in applications like implantable medical monitoring, the system performances have to merge the small area as well as the low-power requirements, in order to facilitate the implant bio-compatibility, avoiding the rejection from the human body. Based on our new mathematical foundations, we built different prototypes to get a neural signal acquisition chip that not only rigorously trades off its area, energy consumption, and the quality of its signal output, but also significantly outperforms the state-of-the-art in all aspects. In the framework of big-data and high-performance computation, such as in high-end servers application, the RF circuits meant to transmit data from chip-to-chip or chip-to-memory are defined by low power requirements, since the heat generated by the integrated circuits is partially distributed by the chip package. Hence, the overall system power budget is defined by its affordable cooling capacity. For this reason, application specific architectures and innovative techniques are used for low-power implementation. In this work, we have developed a single-ended multi-lane receiver for high speed I/O link in servers application. The receiver operates at 7 Gbps by learning inter-symbol interference and electromagnetic coupling noise in chip-to-chip communication systems. A learning-based approach allows a versatile receiver circuit which not only copes with large channel attenuation but also implements novel crosstalk reduction techniques, to allow single-ended multiple lines transmission, without sacrificing its overall bandwidth for a given area within the interconnect's data-path
    corecore