Search CORE

13 research outputs found

차세대 HBM 용 고집적, 저전력 송수신기 설계

Author: 고한곤
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 정덕균.This thesis presents design techniques for high-density power-efficient transceiver for the next-generation high bandwidth memory (HBM). Unlike the other memory interfaces, HBM uses a 3D-stacked package using through-silicon via (TSV) and a silicon interposer. The transceiver for HBM should be able to solve the problems caused by the 3D-stacked package and TSV. At first, a data (DQ) receiver for HBM with a self-tracking loop that tracks a phase skew between DQ and data strobe (DQS) due to a voltage or thermal drift is proposed. The self-tracking loop achieves low power and small area by uti-lizing an analog-assisted baud-rate phase detector. The proposed pulse-to-charge (PC) phase detector (PD) converts the phase skew to a voltage differ-ence and detects the phase skew from the voltage difference. An offset calibra-tion scheme that can compensates for a mismatch of the PD is also proposed. The proposed calibration scheme operates without any additional sensing cir-cuits by taking advantage of the write training of HBM. Fabricated in 65 nm CMOS, the DQ receiver shows a power efficiency of 370 fJ/b at 4.8 Gb/s and occupies 0.0056 mm2. The experimental results show that the DQ receiver op-erates without any performance degradation under a ± 10% supply variation. In a second prototype IC, a high-density transceiver for HBM with a feed-forward-equalizer (FFE)-combined crosstalk (XT) cancellation scheme is pre-sented. To compensate for the XT, the transmitter pre-distorts the amplitude of the FFE output according to the XT. Since the proposed XT cancellation (XTC) scheme reuses the FFE implemented to equalize the channel loss, additional circuits for the XTC is minimized. Thanks to the XTC scheme, a channel pitch can be significantly reduced, allowing for the high channel density. Moreover, the 3D-staggered channel structure removes the ground layer between the verti-cally adjacent channels, which further reduces a cross-sectional area of the channel per lane. The test chip including 6 data lanes is fabricated in 65 nm CMOS technology. The 6-mm channels are implemented on chip to emulate the silicon interposer between the HBM and the processor. The operation of the XTC scheme is verified by simultaneously transmitting 4-Gb/s data to the 6 consecutive channels with 0.5-um pitch and the XTC scheme reduces the XT-induced jitter up to 78 %. The measurement result shows that the transceiver achieves the throughput of 8 Gb/s/um. The transceiver occupies 0.05 mm2 for 6 lanes and consumes 36.6 mW at 6 x 4 Gb/s.본 논문에서는 차세대 HBM을 위한 고집적 저전력 송수신기 설계 방법을 제안한다. 첫 번째로, 전압 및 온도 변화에 의한 데이터와 클럭 간 위상 차이를 보상할 수 있는 자체 추적 루프를 가진 데이터 수신기를 제안한다. 제안하는 자체 추적 루프는 데이터 전송 속도와 같은 속도로 동작하는 위상 검출기를 사용하여 전력 소모와 면적을 줄였다. 또한 메모리의 쓰기 훈련 (write training) 과정을 이용하여 효과적으로 위상 검출기의 오프셋을 보상할 수 있는 방법을 제안한다. 제안하는 데이터 수신기는 65 nm 공정으로 제작되어 4.8 Gb/s에서 370 fJ/b을 소모하였다. 또한 10 % 의 전압 변화에 대하여 안정적으로 동작하는 것을 확인하였다. 두 번째로, 피드 포워드 이퀄라이저와 결합된 크로스 토크 보상 방식을 활용한 고집적 송수신기를 제안한다. 제안하는 송신기는 크로스 토크 크기에 해당하는 만큼 송신기 출력을 왜곡하여 크로스 토크를 보상한다. 제안하는 크로스 토크 보상 방식은 채널 손실을 보상하기 위해 구현된 피드 포워드 이퀄라이저를 재활용함으로써 추가적인 회로를 최소화한다. 제안하는 송수신기는 크로스 토크가 보상 가능하기 때문에, 채널 간격을 크게 줄여 고집적 통신을 구현하였다. 또한 집적도를 더 증가시키기 위해 세로로 인접한 채널 사이의 차폐 층을 제거한 적층 채널 구조를 제안한다. 6개의 송수신기를 포함한 프로토타입 칩은 65 nm 공정으로 제작되었다. HBM과 프로세서 사이의 silicon interposer channel 을 모사하기 위한 6 mm 의 채널이 칩 위에 구현되었다. 제안하는 크로스 토크 보상 방식은 0.5 um 간격의 6개의 인접한 채널에 동시에 데이터를 전송하여 검증되었으며, 크로스 토크로 인한 지터를 최대 78 % 감소시켰다. 제안하는 송수신기는 8 Gb/s/um 의 처리량을 가지며 6 개의 송수신기가 총 36.6 mW의 전력을 소모하였다.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 4 CHAPTER 2 BACKGROUND ON HIGH-BANDWIDTH MEMORY 6 2.1 OVERVIEW 6 2.2 TRANSCEIVER ARCHITECTURE 10 2.3 READ/WRITE OPERATION 15 2.3.1 READ OPERATION 15 2.3.2 WRITE OPERATION 19 CHAPTER 3 BACKGROUNDS ON COUPLED WIRES 21 3.1 GENERALIZED MODEL 21 3.2 EFFECT OF CROSSTALK 26 CHAPTER 4 DQ RECEIVER WITH BAUD-RATE SELF-TRACKING LOOP 29 4.1 OVERVIEW 29 4.2 FEATURES OF DQ RECEIVER FOR HBM 33 4.3 PROPOSED PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.1 OPERATION OF PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.2 OFFSET CALIBRATION 37 4.3.3 OPERATION SEQUENCE 39 4.4 CIRCUIT IMPLEMENTATION 42 4.5 MEASUREMENT RESULT 46 CHAPTER 5 HIGH-DENSITY TRANSCEIVER FOR HBM WITH 3D-STAGGERED CHANNEL AND CROSSTALK CANCELLATION SCHEME 57 5.1 OVERVIEW 57 5.2 PROPOSED 3D-STAGGERED CHANNEL 61 5.2.1 IMPLEMENTATION OF 3D-STAGGERED CHANNEL 61 5.2.2 CHANNEL CHARACTERISTICS AND MODELING 66 5.3 PROPOSED FEED-FORWARD-EQUALIZER-COMBINED CROSSTALK CANCELLATION SCHEME 72 5.4 CIRCUIT IMPLEMENTATION 77 5.4.1 OVERALL ARCHITECTURE 77 5.4.2 TRANSMITTER WITH FFE-COMBINED XTC 79 5.4.3 RECEIVER 81 5.5 MEASUREMENT RESULT 82 CHAPTER 6 CONCLUSION 93 BIBLIOGRAPHY 95 초 록 102Docto

SNU Open Repository and Archive

Novel Multicarrier Memory Channel Architecture Using Microwave Interconnects: Alleviating the Memory Wall

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: The increase in computing power has simultaneously increased the demand for input/output (I/O) bandwidth. Unfortunately, the speed of I/O and memory interconnects have not kept pace. Thus, processor-based systems are I/O and interconnect limited. The memory aggregated bandwidth is not scaling fast enough to keep up with increasing bandwidth demands. The term "memory wall" has been coined to describe this phenomenon. A new memory bus concept that has the potential to push double data rate (DDR) memory speed to 30 Gbit/s is presented. We propose to map the conventional DDR bus to a microwave link using a multicarrier frequency division multiplexing scheme. The memory bus is formed using a microwave signal carried within a waveguide. We call this approach multicarrier memory channel architecture (MCMCA). In MCMCA, each memory signal is modulated onto an RF carrier using 64-QAM format or higher. The carriers are then routed using substrate integrated waveguide (SIW) interconnects. At the receiver, the memory signals are demodulated and then delivered to SDRAM devices. We pioneered the usage of SIW as memory channel interconnects and demonstrated that it alleviates the memory bandwidth bottleneck. We demonstrated SIW performance superiority over conventional transmission line in immunity to cross-talk and electromagnetic interference. We developed a methodology based on design of experiment (DOE) and response surface method techniques that optimizes the design of SIW interconnects and minimizes its performance fluctuations under material and manufacturing variations. Along with using SIW, we implemented a multicarrier architecture which enabled the aggregated DDR bandwidth to reach 30 Gbit/s. We developed an end-to-end system model in Simulink and demonstrated the MCMCA performance for ultra-high throughput memory channel. Experimental characterization of the new channel shows that by using judicious frequency division multiplexing, as few as one SIW interconnect is sufficient to transmit the 64 DDR bits. Overall aggregated bus data rate achieves 240 GBytes/s data transfer with EVM not exceeding 2.26% and phase error of 1.07 degree or less.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

Hybrid NRZ/Multi-Tone Signaling for High-Speed Low-Power Wireline Transceivers

Author: Gharibdoust Kiarash
Publication venue: Lausanne, EPFL
Publication date: 19/04/2016
Field of study

Over the past few decades, incessant growth of Internet networking traffic and High-Performance Computing (HPC) has led to a tremendous demand for data bandwidth. Digital communication technologies combined with advanced integrated circuit scaling trends have enabled the semiconductor and microelectronic industry to dramatically scale the bandwidth of high-loss interfaces such as Ethernet, backplane, and Digital Subscriber Line (DSL). The key to achieving higher bandwidth is to employ equalization technique to compensate the channel impairments such as Inter-Symbol Interference (ISI), crosstalk, and environmental noise. Therefore, todayâs advanced input/outputs (I/Os) has been equipped with sophisticated equalization techniques to push beyond the uncompensated bandwidth of the system. To this end, process scaling has continually increased the data processing capability and improved the I/O performance over the last 15 years. However, since the channel bandwidth has not scaled with the same pace, the required signal processing and equalization circuitry becomes more and more complicated. Thereby, the energy efficiency improvements are largely offset by the energy needed to compensate channel impairments. In this design paradigm, re-thinking about the design strategies in order to not only satisfy the bandwidth performance, but also to improve power-performance becomes an important necessity. It is well known in communication theory that coding and signaling schemes have the potential to provide superior performance over band-limited channels. However, the choice of the optimum data communication algorithm should be considered by accounting for the circuit level power-performance trade-offs. In this thesis we have investigated the application of new algorithm and signaling schemes in wireline communications, especially for communication between microprocessors, memories, and peripherals. A new hybrid NRZ/Multi-Tone (NRZ/MT) signaling method has been developed during the course of this research. The system-level and circuit-level analysis, design, and implementation of the proposed signaling method has been performed in the frame of this work, and the silicon measurement results have proved the efficiency and the robustness of the proposed signaling methodology for wireline interfaces. In the first part of this work, a 7.5 Gb/s hybrid NRZ/MT transceiver (TRX) for multi-drop bus (MDB) memory interfaces is designed and fabricated in 40 nm CMOS technology. Reducing the complexity of the equalization circuitry on the receiver (RX) side, the proposed architecture achieves 1 pJ/bit link efficiency for a MDB channel bearing 45 dB loss at 2.5 GHz. The measurement results of the first prototype confirm that NRZ/MT serial data TRX can offer an energy-efficient solution for MDB memory interfaces. Motivated by the satisfying results of the first prototype, in the second phase of this research we have exploited the properties of multi-tone signaling, especially orthogonality among different sub-bands, to reduce the effect of crosstalk in high-dense wireline interconnects. A four-channel transceiver has been implemented in a standard CMOS 40 nm technology in order to demonstrate the performance of NRZ/MT signaling in presence of high channel loss and strong crosstalk noise. The proposed system achieves 1 pJ/bit power efficiency, while communicating over a MDB memory channel at 36 Gb/s aggregate data rate

Infoscience - École polytechnique fédérale de Lausanne

Toward realizing power scalable and energy proportional high-speed wireline links

Author: Anand Tejasvi
Publication venue
Publication date: 01/12/2015
Field of study

Growing computational demand and proliferation of cloud computing has placed high-speed serial links at the center stage. Due to saturating energy efficiency improvements over the last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as output drivers, receiver, or clock generation and distribution. However, this approach yields very limited efficiency improvement. This dissertation takes an alternative approach toward reducing the serial link power. Instead of optimizing the power of individual building blocks, power of the entire serial link is reduced by exploiting serial link usage by the applications. It has been demonstrated that serial links in servers are underutilized. On average, they are used only 15% of the time, i.e. these links are idle for approximately 85% of the time. Conventional links consume power during idle periods to maintain synchronization between the transmitter and the receiver. However, by powering-off the link when idle and powering it back when needed, power consumption of the serial link can be scaled proportionally to its utilization. This approach of rapid power state transitioning is known as the rapid-on/off approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power, and power state transition energy must all be close to zero. However, in practice, it is very difficult to achieve these ideal conditions. Work presented in this dissertation addresses these challenges. When this research work was started (2011-12), there were only a couple of research papers available in the area of rapid-on/off links. Systematic study or design of a rapid power state transitioning in serial links was not available in the literature. Since rapid-on/off with nanoseconds granularity is not a standard in any wireline communication, even the popular test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However, these challenges provided a unique opportunity to explore new architectural techniques and identify trade-offs. The key contributions of this dissertation are as follows. The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to find alternative ways to reduce the serial link power. The second contribution is to identify potential power saving techniques and evaluate the challenges they pose and the opportunities they present. The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature. The transmitter achieves rapid-on/off capability in voltage mode output driver by using a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and periodic reference insertion. To ease timing requirements, an improved edge replacement logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time as a function of various circuit parameters is also discussed. The proposed transmitter demonstrates energy proportional operation over wide variations of link utilization, and is, therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns, respectively. This dissertation highlights key trade-off in the clock multiplier architecture, to achieve fast power-on-lock capability at the cost of jitter performance. The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi- plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita- tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves power-on-lock in 1ns. The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit- ter and receiver. It was the first reported design of a complete transceiver, with an embedded clock architecture, having rapid-on/off capability. Background phase calibration technique in PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on. The proposed transceiver demonstrates power scalability with a wide range of link utiliza- tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes by 100x (7Gb/s-to-70Mb/s). The sixth and final contribution is the design of a temperature sensor to compensate the frequency drifts due to temperature variations, during long power-off periods, in the fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor is designed with all digital logic gates and achieves low supply sensitivity. This sensor is suitable for integration in processor and DRAM environments. The proposed sensor works on the principle of directly converting temperature information to frequency and finally to digital bits. A novel sensing technique is proposed in which temperature information is acquired by creating a threshold voltage difference between the transistors used in the oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and the overhead of voltage regulators and an external ideal reference frequency is avoided. The effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of ±0.9oC and ±2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 , and measurement (conversion) time of 6.5μs

Illinois Digital Environment for Access to Learning and Scholarship Repository

Design Techniques for High Performance Serial Link Transceivers

Author: Min Byungho
Publication venue
Publication date: 21/08/2017
Field of study

Increasing data rates over electrical channels with significant frequency-dependent loss is difficult due to excessive inter-symbol interference (ISI). In order to achieve sufficient link margins at high rates, I/O system designers implement equalization in the transmitters and are motivated to consider more spectrally-efficient modulation formats relative to the common PAM-2 scheme, such as PAM-4 and duobinary. The first work, reviews when to consider PAM-4 and duobinary formats, as the modulation scheme which yields the highest system margins at a given data rate is a function of the channel loss profile, and presents a 20Gb/s triple-mode transmitter capable of efficiently implementing these three modulation schemes and three-tap feedforward equalization. A statistical link modeling tool, which models ISI, crosstalk, random noise, and timing jitter, is developed to compare the three common modulation formats operating on electrical backplane channel models. In order to improve duobinary modulation efficiency, a low-power quarter-rate duobinary precoder circuit is proposed which provides significant timing margin improvement relative to full-rate precoders. Also as serial I/O data rates scale above 10 Gb/s, crosstalk between neighboring channels degrades system bit-error rate (BER) performance. The next work presents receive-side circuitry which merges the cancellation of both near-end and far-end crosstalk (NEXT/FEXT) and can automatically adapt to different channel environments and variations in process, voltage, and temperature. NEXT cancellation is realized with a novel 3-tap FIR filter which combines two traditional FIR filter taps and a continuous-time band-pass filter IIR tap for efficient crosstalk cancellation, with all filter tap coefficients automatically determined via an ondie sign-sign least-mean-square (SS-LMS) adaptation engine. FEXT cancellation is realized by coupling the aggressor signal through a differentiator circuit whose gain is automatically adjusted with a power-detection-based adaptation loop. In conclusion, the proposed architectures in the transmitter side and receiver side together are to be good solution in the high speed I/O serial links to improve the performance by overcome the physical channel loss and adjacent channel noise as the system becomes complicated

Texas A&M Repository

Optics in Computing:from photonic Network-on-Chip to Chip-to-Chip Interconnects and Disintegrated Architectures

Author: Alexoudi Theonitsa
Kanellos George T.
Maniotis Pavlos
Miliou Amalia
Mitsolidou Charoula
Moralis-Pegios Miltiadis
Mourgias-Alexandris George
Pitris Stelios
Pleros Nikos
Terzenidis Nikolaos
Vagionas Christos
Vyrsokinos Konstantinos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/10/2018
Field of study

Explore Bristol Research

Recommended from our members

Design of Power-Efficient Optical Transceivers and Design of High-Linearity Wireless Wideband Receivers

Author: Zhang Yudong
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2021
Field of study

The combination of silicon photonics and advanced heterogeneous integration is promising for next-generation disaggregated data centers that demand large scale, high throughput, and low power. In this dissertation, we discuss the design and theory of power-efficient optical transceivers with System-in-Package (SiP) 2.5D integration. Combining prior arts and proposed circuit techniques, a receiver chip and a transmitter chip including two 10 Gb/s data channels and one 2.5 GHz clocking channel are designed and implemented in 28 nm CMOS technology. An innovative transimpedance amplifier (TIA) and a single-ended to differential (S2D) converter are proposed and analyzed for a low-voltage high-sensitivity receiver; a four-to-one serializer, programmable output drivers, AC coupling units, and custom pads are implemented in a low-power transmitter; an improved quadrature locked loop (QLL) is employed to generate accurate quadrature clocks. In addition, we present an analysis for inverter-based shunt-feedback TIA to explicitly depict the trade-off among sensitivity, data rate, and power consumption. At last, the research on CDR-based clocking schemes for optical links is also discussed. We introduce prior arts and propose a power-efficient clocking scheme based on an injection-locked phase rotator. Next, we analyze injection-locked ring oscillators (ILROs) that have been widely used for quadrature clock generators (QCGs) in multi-lane optical or wireline transceivers due to their low power, low area, and technology scalability. The asymmetrical or partial injection locking from 2 phases to 4 phases results in imbalances in amplitude and phase. We propose a modified frequency-domain analysis to provide intuitive insight into the performance design trade-offs. The analysis is validated by comparing analytical predictions with simulations for an ILRO-based QCG in 28 nm CMOS technology. This dissertation also discusses the design of high-linearity wireless wideband receivers. An out-of-band (OB) IM3 cancellation technique is proposed and analyzed. By exploiting a baseband auxiliary path (AP) with a high-pass feature, the in-band (IB) desired signal and out-of-band interferers are split. OB third-order intermodulation products (IM3) are reconstructed in the AP and cancelled in the baseband (BB). A 0.5-2.5 GHz frequency-translational noise-cancelling (FTNC) receiver is implemented in 65nm CMOS to demonstrate the proposed approach. It consumes 36 mW without cancellation at 1 GHz LO frequency and 1.2 V supply, and it achieves 8.8 MHz baseband bandwidth, 40dB gain, 3.3dB NF, 5dBm OB IIP3, and −6.5dBm OB B1dB. After IM3 cancellation, the effective OB-IIP3 increases to 32.5 dBm with an extra 34 mW for narrow-band interferers (two tones). For wideband interferers, 18.8 dB cancellation is demonstrated over 10 MHz with two −15 dBm modulated interferers. The local oscillator (LO) leakage is −92 dBm and −88 dB at 1 GHz and 2 GHz LO respectively. In summary, this technique achieves both high OB linearity and good LO isolation

Columbia University Academic Commons

Solid State Circuits Technologies

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The evolution of solid-state circuit technology has a long history within a relatively short period of time. This technology has lead to the modern information society that connects us and tools, a large market, and many types of products and applications. The solid-state circuit technology continuously evolves via breakthroughs and improvements every year. This book is devoted to review and present novel approaches for some of the main issues involved in this exciting and vigorous technology. The book is composed of 22 chapters, written by authors coming from 30 different institutions located in 12 different countries throughout the Americas, Asia and Europe. Thus, reflecting the wide international contribution to the book. The broad range of subjects presented in the book offers a general overview of the main issues in modern solid-state circuit technology. Furthermore, the book offers an in depth analysis on specific subjects for specialists. We believe the book is of great scientific and educational value for many readers. I am profoundly indebted to the support provided by all of those involved in the work. First and foremost I would like to acknowledge and thank the authors who worked hard and generously agreed to share their results and knowledge. Second I would like to express my gratitude to the Intech team that invited me to edit the book and give me their full support and a fruitful experience while working together to combine this book

Directory of Open Access Books (DOAB)

Crosstalk cancelling voltage-mode driver for multi-Gbps parallel DRAM interface

Author: Buckwalter
Jung
Oh
Oh
Publication venue: 'Wiley'
Publication date
Field of study

Crossref