202 research outputs found

    Design Techniques for Energy Efficient Multi-GB/S Serial I/O Transceivers

    Get PDF
    Total I/O bandwidth demand is growing in high-performance systems due to the emergence of many-core microprocessors and in mobile devices to support the next generation of multi-media features. High-speed serial I/O energy efficiency must improve in order to enable continued scaling of these parallel computing platforms in applications ranging from data centers to smart mobile devices. The first work, a low-power forwarded-clock I/O transceiver architecture is presented that employs a high degree of output/input multiplexing, supply-voltage scaling with data rate, and low-voltage circuit techniques to enable low-power operation. The transmitter utilizes a 4:1 output multiplexing voltage-mode driver along with 4-phase clocking that is efficiently generated from a passive poly-phase filter. The output driver voltage swing is accurately controlled from 100-200 mV_(ppd) using a low-voltage pseudo-differential regulator that employs a partial negative-resistance load for improved low frequency gain. 1:8 input de-multiplexing is performed at the receiver equalizer output with 8 parallel input samplers clocked from an 8-phase injection-locked oscillator that provides more than 1UI de-skew range. Low-power high-speed serial I/O transmitters which include equalization to compensate for channel frequency dependent loss are required to meet the aggressive link energy efficiency targets of future systems. The second work presents a low power serial link transmitter design that utilizes an output stage which combines a voltage-mode driver, which offers low static-power dissipation, and current-mode equalization, which offers low complexity and dynamic-power dissipation. The utilization of current-mode equalization decouples the equalization settings and termination impedance, allowing for a significant reduction in pre-driver complexity relative to segmented voltage-mode drivers. Proper transmitter series termination is set with an impedance control loop which adjusts the on-resistance of the output transistors in the driver voltage-mode portion. Further reductions in dynamic power dissipation are achieved through scaling the serializer and local clock distribution supply with data rate. Finally, it presents that a scalable quarter-rate transmitter employs an analog-controlled impedance-modulated 2-tap voltage-mode equalizer and achieves fast power-state transitioning with a replica-biased regulator and ILO clock generation. Capacitively-driven 2 mm global clock distribution and automatic phase calibration allows for aggressive supply scaling

    Clocking and Skew-Optimization For Source-Synchronous Simultaneous Bidirectional Links

    Get PDF
    There is continuous expansion of computing capabilities in mobile devices which demands higher I/O bandwidth and dense parallel links supporting higher data rates. Highspeed signaling leverages technology advancements to achieve higher data rates but is limited by the bandwidth of the electrical copper channel which have not scaled accordingly. To meet the continuous data-rate demand, Simultaneous Bi-directional (SBD) signaling technique is an attractive alternative relative to uni-directional signaling as it can work at lower clock speeds, exhibits better spectral efficiency and provides higher throughput in pad limited PCBs. For low-power and more robust system, the SBD transceiver should utilize forwarded clock system and per-pin de-skew circuits to correct the phase difference developed between the data and clock. The system can be configured in two roles, master and slave. To save more power, the system should have only one clock generator. The master has its own clock source and shares its clock to the slave through the clock channel, and the slave uses this forwarded clock to deserialize the inbound data and serialize the outbound data. A clock-to-data skew exists which can be corrected with a phase tracking CDR. This thesis presents a low-power implementation of forwarded clocking and clock-to-data skew optimization for a 40 Gbps SBD transceiver. The design is implemented in 28nm CMOS technology and consumes 8.8mW of power for 20 Gbps NRZ data at 0.9 V supply. The area occupied by the clocking 0.018 mm^2 area

    Toward realizing power scalable and energy proportional high-speed wireline links

    Get PDF
    Growing computational demand and proliferation of cloud computing has placed high-speed serial links at the center stage. Due to saturating energy efficiency improvements over the last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as output drivers, receiver, or clock generation and distribution. However, this approach yields very limited efficiency improvement. This dissertation takes an alternative approach toward reducing the serial link power. Instead of optimizing the power of individual building blocks, power of the entire serial link is reduced by exploiting serial link usage by the applications. It has been demonstrated that serial links in servers are underutilized. On average, they are used only 15% of the time, i.e. these links are idle for approximately 85% of the time. Conventional links consume power during idle periods to maintain synchronization between the transmitter and the receiver. However, by powering-off the link when idle and powering it back when needed, power consumption of the serial link can be scaled proportionally to its utilization. This approach of rapid power state transitioning is known as the rapid-on/off approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power, and power state transition energy must all be close to zero. However, in practice, it is very difficult to achieve these ideal conditions. Work presented in this dissertation addresses these challenges. When this research work was started (2011-12), there were only a couple of research papers available in the area of rapid-on/off links. Systematic study or design of a rapid power state transitioning in serial links was not available in the literature. Since rapid-on/off with nanoseconds granularity is not a standard in any wireline communication, even the popular test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However, these challenges provided a unique opportunity to explore new architectural techniques and identify trade-offs. The key contributions of this dissertation are as follows. The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to find alternative ways to reduce the serial link power. The second contribution is to identify potential power saving techniques and evaluate the challenges they pose and the opportunities they present. The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature. The transmitter achieves rapid-on/off capability in voltage mode output driver by using a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and periodic reference insertion. To ease timing requirements, an improved edge replacement logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time as a function of various circuit parameters is also discussed. The proposed transmitter demonstrates energy proportional operation over wide variations of link utilization, and is, therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns, respectively. This dissertation highlights key trade-off in the clock multiplier architecture, to achieve fast power-on-lock capability at the cost of jitter performance. The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi- plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita- tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves power-on-lock in 1ns. The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit- ter and receiver. It was the first reported design of a complete transceiver, with an embedded clock architecture, having rapid-on/off capability. Background phase calibration technique in PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on. The proposed transceiver demonstrates power scalability with a wide range of link utiliza- tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes by 100x (7Gb/s-to-70Mb/s). The sixth and final contribution is the design of a temperature sensor to compensate the frequency drifts due to temperature variations, during long power-off periods, in the fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor is designed with all digital logic gates and achieves low supply sensitivity. This sensor is suitable for integration in processor and DRAM environments. The proposed sensor works on the principle of directly converting temperature information to frequency and finally to digital bits. A novel sensing technique is proposed in which temperature information is acquired by creating a threshold voltage difference between the transistors used in the oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and the overhead of voltage regulators and an external ideal reference frequency is avoided. The effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of ยฑ0.9oC and ยฑ2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 , and measurement (conversion) time of 6.5ฮผs

    Power-Proportional Optical Links

    Get PDF
    The continuous increase in data transfer rate in short-reach links, such as chip-to-chip and between servers within a data-center, demands high-speed links. As power efficiency becomes ever more important in these links, power-efficient optical links need to be designed. Power efficiency in a link can be achieved by enabling power-proportional communication over the serial link. In power-proportional links, the power dissipated by a link is proportional to the amount of data communicated. Normally, data-rate demand is not constant, and the peak data-rate is not required all the time. If a link is not adapted according to the data-rate demand, there will be a fixed power dissipation, and the power efficiency of the link will degrade during the sub-maximal link utilization. Adapting links to real-time data-rate requirements reduces power dissipation. Power proportionality is achieved by scaling the power of the serial link linearly with the link utilization, and techniques such as variable data-rate and burst-mode can be adopted for this purpose. Links whose data rate (and hence power dissipation) can be varied in response to system demands are proposed in this work. Past works have presented rapidly reconfigurable bandwidth in variable data-rate receivers, allowing lower power dissipation for lower data-rate operation. However, maintaining synchronization during reconfiguration was not possible since previous approaches have introduced changes in front-end delay when they are reconfigured. This work presents a technique that allows rapid bandwidth adjustment while maintaining a near-constant delay through the receiver suitable for a power-scalable variable data-rate optical link. Measurements of a fabricated integrated circuit (IC) show nearly constant energy per bit across a 2ร— variation in data rate while introducing less than 10 % of a unit interval (UI) of delay variation. With continuously increasing data communication in data-centers, parallel optical links with ever-increasing per-lane data rates are being used to meet overall throughput demands. Simultaneously, power efficiency is becoming increasingly important for these links since they do not transmit useful data all the time. The burst-mode solution for vertical-cavity surface-emitting laser (VCSEL)-based point-to-point communication can be used to improve linksโ€™ energy efficiency during low link activity. The burst-mode technique for VCSEL-based links has not yet been deployed commercially. Past works have presented burst-mode solutions for single-channel receivers, allowing lower power dissipation during low link activity and solutions for fast activation of the receivers. However, this work presents a novel technique that allows rapid activation of a front-end and fast locking of a clock-and-data-recovery (CDR) for a multi-channel parallel link, utilizing opportunities arising from the parallel nature of many VCSEL-based links. The idea has been demonstrated through electrical and optical measurements of a fabricated IC at 10 Gbps, which show fast data detection and activation of the circuitry within 49 UIs while allowing the front-end to achieve better energy efficiency during low link activity. Simulation results are also presented in support of the proposed technique which allows the CDR to lock within 26 UIs from when it is powered on

    Design of a clock and data recovery circuit in 65 nm technology

    Get PDF
    As semiconductor fabrication technology develops, the demand for higher transmission data rates constantly increases; thus there is an urgent need for a power-efficient, robust and broad bandwidth chip-to-chip communication method. A lot of work has been done to address this issue as researchers strive for more integrated inter-IC communication technology with CMOS. A high-speed serial link (HSSL) can help meet this goal. The clock and data recovery circuit (CDR) is a critical component of the HSSL. CDR is built on the receiver end of the link after proper equalization. Its purpose is to extract clock signal which is not transmitted from the driver end and to use the extracted clock signal to sample the incoming data stream with optimal timing. In this thesis, the working mechanism of the CDR is described. A CDR consists of a phase detector, a charge pump, a loop filter and a voltage-controlled oscillator. This thesis includes an overview of all the building blocks of a PLL-based CDR, derivation of the mathematical formulations of the negative feedback loop, and a report on closed loop behavioral modeling of the entire CDR and implemented CDR building blocks at transistor level with TSMC 65 nm technology PDK with a 6.4 Gbps data rate. Also, this thesis provides a detailed noise analysis of the CDR. Lastly, some future work and possible design improvements are proposed

    Design of a clock and data recovery circuit in 65 nm technology

    Get PDF
    As semiconductor fabrication technology develops, the demand for higher transmission data rates constantly increases; thus there is an urgent need for a power-efficient, robust and broad bandwidth chip-to-chip communication method. A lot of work has been done to address this issue as researchers strive for more integrated inter-IC communication technology with CMOS. A high-speed serial link (HSSL) can help meet this goal. The clock and data recovery circuit (CDR) is a critical component of the HSSL. CDR is built on the receiver end of the link after proper equalization. Its purpose is to extract clock signal which is not transmitted from the driver end and to use the extracted clock signal to sample the incoming data stream with optimal timing. In this thesis, the working mechanism of the CDR is described. A CDR consists of a phase detector, a charge pump, a loop filter and a voltage-controlled oscillator. This thesis includes an overview of all the building blocks of a PLL-based CDR, derivation of the mathematical formulations of the negative feedback loop, and a report on closed loop behavioral modeling of the entire CDR and implemented CDR building blocks at transistor level with TSMC 65 nm technology PDK with a 6.4 Gbps data rate. Also, this thesis provides a detailed noise analysis of the CDR. Lastly, some future work and possible design improvements are proposed

    ์ฐจ์„ธ๋Œ€ HBM ์šฉ ๊ณ ์ง‘์ , ์ €์ „๋ ฅ ์†ก์ˆ˜์‹ ๊ธฐ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ์ •๋•๊ท .This thesis presents design techniques for high-density power-efficient transceiver for the next-generation high bandwidth memory (HBM). Unlike the other memory interfaces, HBM uses a 3D-stacked package using through-silicon via (TSV) and a silicon interposer. The transceiver for HBM should be able to solve the problems caused by the 3D-stacked package and TSV. At first, a data (DQ) receiver for HBM with a self-tracking loop that tracks a phase skew between DQ and data strobe (DQS) due to a voltage or thermal drift is proposed. The self-tracking loop achieves low power and small area by uti-lizing an analog-assisted baud-rate phase detector. The proposed pulse-to-charge (PC) phase detector (PD) converts the phase skew to a voltage differ-ence and detects the phase skew from the voltage difference. An offset calibra-tion scheme that can compensates for a mismatch of the PD is also proposed. The proposed calibration scheme operates without any additional sensing cir-cuits by taking advantage of the write training of HBM. Fabricated in 65 nm CMOS, the DQ receiver shows a power efficiency of 370 fJ/b at 4.8 Gb/s and occupies 0.0056 mm2. The experimental results show that the DQ receiver op-erates without any performance degradation under a ยฑ 10% supply variation. In a second prototype IC, a high-density transceiver for HBM with a feed-forward-equalizer (FFE)-combined crosstalk (XT) cancellation scheme is pre-sented. To compensate for the XT, the transmitter pre-distorts the amplitude of the FFE output according to the XT. Since the proposed XT cancellation (XTC) scheme reuses the FFE implemented to equalize the channel loss, additional circuits for the XTC is minimized. Thanks to the XTC scheme, a channel pitch can be significantly reduced, allowing for the high channel density. Moreover, the 3D-staggered channel structure removes the ground layer between the verti-cally adjacent channels, which further reduces a cross-sectional area of the channel per lane. The test chip including 6 data lanes is fabricated in 65 nm CMOS technology. The 6-mm channels are implemented on chip to emulate the silicon interposer between the HBM and the processor. The operation of the XTC scheme is verified by simultaneously transmitting 4-Gb/s data to the 6 consecutive channels with 0.5-um pitch and the XTC scheme reduces the XT-induced jitter up to 78 %. The measurement result shows that the transceiver achieves the throughput of 8 Gb/s/um. The transceiver occupies 0.05 mm2 for 6 lanes and consumes 36.6 mW at 6 x 4 Gb/s.๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ฐจ์„ธ๋Œ€ HBM์„ ์œ„ํ•œ ๊ณ ์ง‘์  ์ €์ „๋ ฅ ์†ก์ˆ˜์‹ ๊ธฐ ์„ค๊ณ„ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ์ „์•• ๋ฐ ์˜จ๋„ ๋ณ€ํ™”์— ์˜ํ•œ ๋ฐ์ดํ„ฐ์™€ ํด๋Ÿญ ๊ฐ„ ์œ„์ƒ ์ฐจ์ด๋ฅผ ๋ณด์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์ž์ฒด ์ถ”์  ๋ฃจํ”„๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ ์ˆ˜์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์ž์ฒด ์ถ”์  ๋ฃจํ”„๋Š” ๋ฐ์ดํ„ฐ ์ „์†ก ์†๋„์™€ ๊ฐ™์€ ์†๋„๋กœ ๋™์ž‘ํ•˜๋Š” ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ „๋ ฅ ์†Œ๋ชจ์™€ ๋ฉด์ ์„ ์ค„์˜€๋‹ค. ๋˜ํ•œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์“ฐ๊ธฐ ํ›ˆ๋ จ (write training) ๊ณผ์ •์„ ์ด์šฉํ•˜์—ฌ ํšจ๊ณผ์ ์œผ๋กœ ์œ„์ƒ ๊ฒ€์ถœ๊ธฐ์˜ ์˜คํ”„์…‹์„ ๋ณด์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐ์ดํ„ฐ ์ˆ˜์‹ ๊ธฐ๋Š” 65 nm ๊ณต์ •์œผ๋กœ ์ œ์ž‘๋˜์–ด 4.8 Gb/s์—์„œ 370 fJ/b์„ ์†Œ๋ชจํ•˜์˜€๋‹ค. ๋˜ํ•œ 10 % ์˜ ์ „์•• ๋ณ€ํ™”์— ๋Œ€ํ•˜์—ฌ ์•ˆ์ •์ ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ํ”ผ๋“œ ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ €์™€ ๊ฒฐํ•ฉ๋œ ํฌ๋กœ์Šค ํ† ํฌ ๋ณด์ƒ ๋ฐฉ์‹์„ ํ™œ์šฉํ•œ ๊ณ ์ง‘์  ์†ก์ˆ˜์‹ ๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์†ก์‹ ๊ธฐ๋Š” ํฌ๋กœ์Šค ํ† ํฌ ํฌ๊ธฐ์— ํ•ด๋‹นํ•˜๋Š” ๋งŒํผ ์†ก์‹ ๊ธฐ ์ถœ๋ ฅ์„ ์™œ๊ณกํ•˜์—ฌ ํฌ๋กœ์Šค ํ† ํฌ๋ฅผ ๋ณด์ƒํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ํฌ๋กœ์Šค ํ† ํฌ ๋ณด์ƒ ๋ฐฉ์‹์€ ์ฑ„๋„ ์†์‹ค์„ ๋ณด์ƒํ•˜๊ธฐ ์œ„ํ•ด ๊ตฌํ˜„๋œ ํ”ผ๋“œ ํฌ์›Œ๋“œ ์ดํ€„๋ผ์ด์ €๋ฅผ ์žฌํ™œ์šฉํ•จ์œผ๋กœ์จ ์ถ”๊ฐ€์ ์ธ ํšŒ๋กœ๋ฅผ ์ตœ์†Œํ™”ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์†ก์ˆ˜์‹ ๊ธฐ๋Š” ํฌ๋กœ์Šค ํ† ํฌ๊ฐ€ ๋ณด์ƒ ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ฑ„๋„ ๊ฐ„๊ฒฉ์„ ํฌ๊ฒŒ ์ค„์—ฌ ๊ณ ์ง‘์  ํ†ต์‹ ์„ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ๋˜ํ•œ ์ง‘์ ๋„๋ฅผ ๋” ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์„ธ๋กœ๋กœ ์ธ์ ‘ํ•œ ์ฑ„๋„ ์‚ฌ์ด์˜ ์ฐจํ ์ธต์„ ์ œ๊ฑฐํ•œ ์ ์ธต ์ฑ„๋„ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. 6๊ฐœ์˜ ์†ก์ˆ˜์‹ ๊ธฐ๋ฅผ ํฌํ•จํ•œ ํ”„๋กœํ† ํƒ€์ž… ์นฉ์€ 65 nm ๊ณต์ •์œผ๋กœ ์ œ์ž‘๋˜์—ˆ๋‹ค. HBM๊ณผ ํ”„๋กœ์„ธ์„œ ์‚ฌ์ด์˜ silicon interposer channel ์„ ๋ชจ์‚ฌํ•˜๊ธฐ ์œ„ํ•œ 6 mm ์˜ ์ฑ„๋„์ด ์นฉ ์œ„์— ๊ตฌํ˜„๋˜์—ˆ๋‹ค. ์ œ์•ˆํ•˜๋Š” ํฌ๋กœ์Šค ํ† ํฌ ๋ณด์ƒ ๋ฐฉ์‹์€ 0.5 um ๊ฐ„๊ฒฉ์˜ 6๊ฐœ์˜ ์ธ์ ‘ํ•œ ์ฑ„๋„์— ๋™์‹œ์— ๋ฐ์ดํ„ฐ๋ฅผ ์ „์†กํ•˜์—ฌ ๊ฒ€์ฆ๋˜์—ˆ์œผ๋ฉฐ, ํฌ๋กœ์Šค ํ† ํฌ๋กœ ์ธํ•œ ์ง€ํ„ฐ๋ฅผ ์ตœ๋Œ€ 78 % ๊ฐ์†Œ์‹œ์ผฐ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์†ก์ˆ˜์‹ ๊ธฐ๋Š” 8 Gb/s/um ์˜ ์ฒ˜๋ฆฌ๋Ÿ‰์„ ๊ฐ€์ง€๋ฉฐ 6 ๊ฐœ์˜ ์†ก์ˆ˜์‹ ๊ธฐ๊ฐ€ ์ด 36.6 mW์˜ ์ „๋ ฅ์„ ์†Œ๋ชจํ•˜์˜€๋‹ค.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 4 CHAPTER 2 BACKGROUND ON HIGH-BANDWIDTH MEMORY 6 2.1 OVERVIEW 6 2.2 TRANSCEIVER ARCHITECTURE 10 2.3 READ/WRITE OPERATION 15 2.3.1 READ OPERATION 15 2.3.2 WRITE OPERATION 19 CHAPTER 3 BACKGROUNDS ON COUPLED WIRES 21 3.1 GENERALIZED MODEL 21 3.2 EFFECT OF CROSSTALK 26 CHAPTER 4 DQ RECEIVER WITH BAUD-RATE SELF-TRACKING LOOP 29 4.1 OVERVIEW 29 4.2 FEATURES OF DQ RECEIVER FOR HBM 33 4.3 PROPOSED PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.1 OPERATION OF PULSE-TO-CHARGE PHASE DETECTOR 35 4.3.2 OFFSET CALIBRATION 37 4.3.3 OPERATION SEQUENCE 39 4.4 CIRCUIT IMPLEMENTATION 42 4.5 MEASUREMENT RESULT 46 CHAPTER 5 HIGH-DENSITY TRANSCEIVER FOR HBM WITH 3D-STAGGERED CHANNEL AND CROSSTALK CANCELLATION SCHEME 57 5.1 OVERVIEW 57 5.2 PROPOSED 3D-STAGGERED CHANNEL 61 5.2.1 IMPLEMENTATION OF 3D-STAGGERED CHANNEL 61 5.2.2 CHANNEL CHARACTERISTICS AND MODELING 66 5.3 PROPOSED FEED-FORWARD-EQUALIZER-COMBINED CROSSTALK CANCELLATION SCHEME 72 5.4 CIRCUIT IMPLEMENTATION 77 5.4.1 OVERALL ARCHITECTURE 77 5.4.2 TRANSMITTER WITH FFE-COMBINED XTC 79 5.4.3 RECEIVER 81 5.5 MEASUREMENT RESULT 82 CHAPTER 6 CONCLUSION 93 BIBLIOGRAPHY 95 ์ดˆ ๋ก 102Docto

    Hybrid NRZ/Multi-Tone Signaling for High-Speed Low-Power Wireline Transceivers

    Get PDF
    Over the past few decades, incessant growth of Internet networking traffic and High-Performance Computing (HPC) has led to a tremendous demand for data bandwidth. Digital communication technologies combined with advanced integrated circuit scaling trends have enabled the semiconductor and microelectronic industry to dramatically scale the bandwidth of high-loss interfaces such as Ethernet, backplane, and Digital Subscriber Line (DSL). The key to achieving higher bandwidth is to employ equalization technique to compensate the channel impairments such as Inter-Symbol Interference (ISI), crosstalk, and environmental noise. Therefore, todayรขs advanced input/outputs (I/Os) has been equipped with sophisticated equalization techniques to push beyond the uncompensated bandwidth of the system. To this end, process scaling has continually increased the data processing capability and improved the I/O performance over the last 15 years. However, since the channel bandwidth has not scaled with the same pace, the required signal processing and equalization circuitry becomes more and more complicated. Thereby, the energy efficiency improvements are largely offset by the energy needed to compensate channel impairments. In this design paradigm, re-thinking about the design strategies in order to not only satisfy the bandwidth performance, but also to improve power-performance becomes an important necessity. It is well known in communication theory that coding and signaling schemes have the potential to provide superior performance over band-limited channels. However, the choice of the optimum data communication algorithm should be considered by accounting for the circuit level power-performance trade-offs. In this thesis we have investigated the application of new algorithm and signaling schemes in wireline communications, especially for communication between microprocessors, memories, and peripherals. A new hybrid NRZ/Multi-Tone (NRZ/MT) signaling method has been developed during the course of this research. The system-level and circuit-level analysis, design, and implementation of the proposed signaling method has been performed in the frame of this work, and the silicon measurement results have proved the efficiency and the robustness of the proposed signaling methodology for wireline interfaces. In the first part of this work, a 7.5 Gb/s hybrid NRZ/MT transceiver (TRX) for multi-drop bus (MDB) memory interfaces is designed and fabricated in 40 nm CMOS technology. Reducing the complexity of the equalization circuitry on the receiver (RX) side, the proposed architecture achieves 1 pJ/bit link efficiency for a MDB channel bearing 45 dB loss at 2.5 GHz. The measurement results of the first prototype confirm that NRZ/MT serial data TRX can offer an energy-efficient solution for MDB memory interfaces. Motivated by the satisfying results of the first prototype, in the second phase of this research we have exploited the properties of multi-tone signaling, especially orthogonality among different sub-bands, to reduce the effect of crosstalk in high-dense wireline interconnects. A four-channel transceiver has been implemented in a standard CMOS 40 nm technology in order to demonstrate the performance of NRZ/MT signaling in presence of high channel loss and strong crosstalk noise. The proposed system achieves 1 pJ/bit power efficiency, while communicating over a MDB memory channel at 36 Gb/s aggregate data rate

    Multi-gigabit CMOS analog-to-digital converter and mixed-signal demodulator for low-power millimeter-wave communication systems

    Get PDF
    The objective of the research is to develop high-speed ADCs and mixed-signal demodulator for multi-gigabit communication systems using millimeter-wave frequency bands in standard CMOS technology. With rapid advancements in semiconductor technologies, mobile communication devices have become more versatile, portable, and inexpensive over the last few decades. However, plagued by the short lifetime of batteries, low power consumption has become an extremely important specification in developing mobile communication devices. The ever-expanding demand of consumers to access and share information ubiquitously at faster speeds requires higher throughputs, increased signal-processing functionalities at lower power and lower costs. In todayโ€™s technology, high-speed signal processing and data converters are incorporated in almost all modern multi-gigabit communication systems. They are key enabling technologies for scalable digital design and implementation of baseband signal processors. Ultimately, the merits of a high performance mixed-signal receiver, such as data rate, sensitivity, signal dynamic range, bit-error rate, and power consumption, are directly related to the quality of the embedded ADCs. Therefore, this dissertation focuses on the analysis and design of high-speed ADCs and a novel broadband mixed-signal demodulator with a fully-integrated DSP composed of low-cost CMOS circuitry. The proposed system features a novel dual-mode solution to demodulate multi-gigabit BPSK and ASK signals. This approach reduces the resolution requirement of high-speed ADCs, while dramatically reducing its power consumption for multi-gigabit wireless communication systems.PhDGee-Kung Chang - Committee Chair; Chang-Ho Lee - Committee Member; Geoffrey Ye Li - Committee Member; Paul A. Kohl - Committee Member; Shyh-Chiang Shen - Committee Membe

    Design of Low-Power NRZ/PAM-4 Wireline Transmitters

    Get PDF
    Rapid growing demand for instant multimedia access in a myriad of digital devices has pushed the need for higher bandwidth in modern communication hardwares ranging from short-reach (SR) memory/storage interfaces to long-reach (LR) data center Ethernets. At the same time, comprehensive design optimization of link system that meets the energy-efficiency is required for mobile computing and low operational cost at datacenters. This doctoral study consists of design of two low-swing wireline transmitters featuring a low-power clock distribution and 2-tap equalization in energy-efficient manners up to 20-Gb/s operation. In spite of the reduced signaling power in the voltage-mode (VM) transmit driver, the presence of the segment selection logic still diminishes the power saving benefit. The first work presents a scalable VM transmitter which offers low static power dissipation and adopts an impedance-modulated 2-tap equalizer with analog tap control, thereby obviating driver segmentation and reducing pre-driver complexity and dynamic power. Per-channel quadrature clock generation with injection-locked oscillators (ILO) allows the generation of rail-to-rail quadrature clocks. Energy efficiency is further improved with capacitively driven low-swing global clock distribution and supply scaling at lower data rates, while output eye quality is maintained at low voltages with automatic phase calibration of the local ILO-generated quarter-rate clocks. A prototype fabricated in a general purpose 65 nm CMOS process includes a 2 mm global clock distribution network and two transmitters that support an output swing range of 100-300mV with up to 12-dB of equalization. The transmitters achieve 8-16 Gb/s operation at 0.65-1.05 pJ/b energy efficiency. The second work involves a dual-mode NRZ/PAM-4 differential low-swing voltage-mode (VM) transmitter. The pulse-selected output multiplexing allows reduction of power supply and deterministic jitter caused by large on-chip parasitic inherent in the transmission-gate-based multiplexers in the earlier work. Analog impedance control replica circuits running in the background produce gate-biasing voltages that control the peaking ratio for 2-tap feed-forward equalization and PAM-4 symbol levels for high-linearity. This analog control also allows for efficient generation of the middle levels in PAM-4 operation with good linearity quantified by level separation mismatch ratio of 95%. In NRZ mode, 2-tap feedforward equalization is configurable in high-performance controlled-impedance or energy-efficient impedance-modulated settings to provide performance scalability. Analytic design consideration on dynamic power, data-rate, mismatch, and output swing brings optimal performance metric on the given technology node. The proof-of-concept prototype is verified on silicon with 65 nm CMOS process with improved performance in speed and energy-efficiency owing to double-stack NMOS transistors in the output stage. The transmitter consumes as low as 29.6mW in 20-Gb/s NRZ and 25.5mW in the 28-Gb/s PAM-4 operations
    • โ€ฆ
    corecore