5 research outputs found

    Toward realizing power scalable and energy proportional high-speed wireline links

    Get PDF
    Growing computational demand and proliferation of cloud computing has placed high-speed serial links at the center stage. Due to saturating energy efficiency improvements over the last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as output drivers, receiver, or clock generation and distribution. However, this approach yields very limited efficiency improvement. This dissertation takes an alternative approach toward reducing the serial link power. Instead of optimizing the power of individual building blocks, power of the entire serial link is reduced by exploiting serial link usage by the applications. It has been demonstrated that serial links in servers are underutilized. On average, they are used only 15% of the time, i.e. these links are idle for approximately 85% of the time. Conventional links consume power during idle periods to maintain synchronization between the transmitter and the receiver. However, by powering-off the link when idle and powering it back when needed, power consumption of the serial link can be scaled proportionally to its utilization. This approach of rapid power state transitioning is known as the rapid-on/off approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power, and power state transition energy must all be close to zero. However, in practice, it is very difficult to achieve these ideal conditions. Work presented in this dissertation addresses these challenges. When this research work was started (2011-12), there were only a couple of research papers available in the area of rapid-on/off links. Systematic study or design of a rapid power state transitioning in serial links was not available in the literature. Since rapid-on/off with nanoseconds granularity is not a standard in any wireline communication, even the popular test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However, these challenges provided a unique opportunity to explore new architectural techniques and identify trade-offs. The key contributions of this dissertation are as follows. The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to find alternative ways to reduce the serial link power. The second contribution is to identify potential power saving techniques and evaluate the challenges they pose and the opportunities they present. The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature. The transmitter achieves rapid-on/off capability in voltage mode output driver by using a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and periodic reference insertion. To ease timing requirements, an improved edge replacement logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time as a function of various circuit parameters is also discussed. The proposed transmitter demonstrates energy proportional operation over wide variations of link utilization, and is, therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns, respectively. This dissertation highlights key trade-off in the clock multiplier architecture, to achieve fast power-on-lock capability at the cost of jitter performance. The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi- plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita- tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves power-on-lock in 1ns. The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit- ter and receiver. It was the first reported design of a complete transceiver, with an embedded clock architecture, having rapid-on/off capability. Background phase calibration technique in PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on. The proposed transceiver demonstrates power scalability with a wide range of link utiliza- tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes by 100x (7Gb/s-to-70Mb/s). The sixth and final contribution is the design of a temperature sensor to compensate the frequency drifts due to temperature variations, during long power-off periods, in the fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor is designed with all digital logic gates and achieves low supply sensitivity. This sensor is suitable for integration in processor and DRAM environments. The proposed sensor works on the principle of directly converting temperature information to frequency and finally to digital bits. A novel sensing technique is proposed in which temperature information is acquired by creating a threshold voltage difference between the transistors used in the oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and the overhead of voltage regulators and an external ideal reference frequency is avoided. The effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of ยฑ0.9oC and ยฑ2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 , and measurement (conversion) time of 6.5ฮผs

    Hybrid NRZ/Multi-Tone Signaling for High-Speed Low-Power Wireline Transceivers

    Get PDF
    Over the past few decades, incessant growth of Internet networking traffic and High-Performance Computing (HPC) has led to a tremendous demand for data bandwidth. Digital communication technologies combined with advanced integrated circuit scaling trends have enabled the semiconductor and microelectronic industry to dramatically scale the bandwidth of high-loss interfaces such as Ethernet, backplane, and Digital Subscriber Line (DSL). The key to achieving higher bandwidth is to employ equalization technique to compensate the channel impairments such as Inter-Symbol Interference (ISI), crosstalk, and environmental noise. Therefore, todayรขs advanced input/outputs (I/Os) has been equipped with sophisticated equalization techniques to push beyond the uncompensated bandwidth of the system. To this end, process scaling has continually increased the data processing capability and improved the I/O performance over the last 15 years. However, since the channel bandwidth has not scaled with the same pace, the required signal processing and equalization circuitry becomes more and more complicated. Thereby, the energy efficiency improvements are largely offset by the energy needed to compensate channel impairments. In this design paradigm, re-thinking about the design strategies in order to not only satisfy the bandwidth performance, but also to improve power-performance becomes an important necessity. It is well known in communication theory that coding and signaling schemes have the potential to provide superior performance over band-limited channels. However, the choice of the optimum data communication algorithm should be considered by accounting for the circuit level power-performance trade-offs. In this thesis we have investigated the application of new algorithm and signaling schemes in wireline communications, especially for communication between microprocessors, memories, and peripherals. A new hybrid NRZ/Multi-Tone (NRZ/MT) signaling method has been developed during the course of this research. The system-level and circuit-level analysis, design, and implementation of the proposed signaling method has been performed in the frame of this work, and the silicon measurement results have proved the efficiency and the robustness of the proposed signaling methodology for wireline interfaces. In the first part of this work, a 7.5 Gb/s hybrid NRZ/MT transceiver (TRX) for multi-drop bus (MDB) memory interfaces is designed and fabricated in 40 nm CMOS technology. Reducing the complexity of the equalization circuitry on the receiver (RX) side, the proposed architecture achieves 1 pJ/bit link efficiency for a MDB channel bearing 45 dB loss at 2.5 GHz. The measurement results of the first prototype confirm that NRZ/MT serial data TRX can offer an energy-efficient solution for MDB memory interfaces. Motivated by the satisfying results of the first prototype, in the second phase of this research we have exploited the properties of multi-tone signaling, especially orthogonality among different sub-bands, to reduce the effect of crosstalk in high-dense wireline interconnects. A four-channel transceiver has been implemented in a standard CMOS 40 nm technology in order to demonstrate the performance of NRZ/MT signaling in presence of high channel loss and strong crosstalk noise. The proposed system achieves 1 pJ/bit power efficiency, while communicating over a MDB memory channel at 36 Gb/s aggregate data rate

    Design of energy efficient high speed I/O interfaces

    Get PDF
    Energy efficiency has become a key performance metric for wireline high speed I/O interfaces. Consequently, design of low power I/O interfaces has garnered large interest that has mostly been focused on active power reduction techniques at peak data rate. In practice, most systems exhibit a wide range of data transfer patterns. As a result, low energy per bit operation at peak data rate does not necessarily translate to overall low energy operation. Therefore, I/O interfaces that can scale their power consumption with data rate requirement are desirable. Rapid on-off I/O interfaces have a potential to scale power with data rate requirements without severely affecting either latency or the throughput of the I/O interface. In this work, we explore circuit techniques for designing rapid on-off high speed wireline I/O interfaces and digital fractional-N PLLs. A burst-mode transmitter suitable for rapid on-off I/O interfaces is presented that achieves 6 ns turn-on time by utilizing a fast frequency settling ring oscillator in digital multiplying delay-locked loop and a rapid on-off biasing scheme for current mode output driver. Fabricated in 90 nm CMOS process, the prototype achieves 2.29 mW/Gb/s energy efficiency at peak data rate of 8 Gb/s. A 125X (8 Gb/s to 64 Mb/s) change in effective data rate results in 67X (18.29 mW to 0.27 mW) change in transmitter power consumption corresponding to only 2X (2.29 mW/Gb/s to 4.24 mW/Gb/s) degradation in energy efficiency for 32-byte long data bursts. We also present an analytical bit error rate (BER) computation technique for this transmitter under rapid on-off operation, which uses MDLL settling measurement data in conjunction with always-on transmitter measurements. This technique indicates that the BER bathtub width for 10^(โˆ’12) BER is 0.65 UI and 0.72 UI during rapid on-off operation and always-on operation, respectively. Next, a pulse response estimation-based technique is proposed enabling burst-mode operation for baud-rate sampling receivers that operate over high loss channels. Such receivers typically employ discrete time equalization to combat inter-symbol interference. Implementation details are provided for a receiver chip, fabricated in 65nm CMOS technology, that demonstrates efficacy of the proposed technique. A low complexity pulse response estimation technique is also presented for low power receivers that do not employ discrete time equalizers. We also present techniques for implementation of highly digital fractional-N PLL employing a phase interpolator based fractional divider to improve the quantization noise shaping properties of a 1-bit โˆ†ฮฃ frequency-to-digital converter. Fabricated in 65nm CMOS process, the prototype calibration-free fractional-N Type-II PLL employs the proposed frequency-to-digital converter in place of a high resolution time-to-digital converter and achieves 848 fs rms integrated jitter (1 kHz-30 MHz) and -101 dBc/Hz in-band phase noise while generating 5.054 GHz output from 31.25 MHz input

    ๋ฐ์ดํ„ฐ ์ „์†ก๋กœ ํ™•์žฅ์„ฑ๊ณผ ๋ฃจํ”„ ์„ ํ˜•์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚จ ๋‹ค์ค‘์ฑ„๋„ ์ˆ˜์‹ ๊ธฐ๋“ค์— ๊ด€ํ•œ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2013. 2. ์ •๋•๊ท .Two types of serial data communication receivers that adopt a multichannel architecture for a high aggregate I/O bandwidth are presented. Two techniques for collaboration and sharing among channels are proposed to enhance the loop-linearity and channel-expandability of multichannel receivers, respectively. The first proposed receiver employs a collaborative timing scheme recovery which relies on the sharing of all outputs of phase detectors (PDs) among channels to extract common information about the timing and multilevel signaling architecture of PAM-4. The shared timing information is processed by a common global loop filter and is used to update the phase of the voltage-controlled oscillator with better rejection of per-channel noise. In addition to collaborative timing recovery, a simple linearization technique for binary PDs is proposed. The technique realizes a high-rate oversampling PD while the hardware cost is equivalent to that of a conventional 2x-oversampling clock and data recovery. The first receiver exploiting the collaborative timing recovery architecture is designed using 45-nm CMOS technology. A single data lane occupies a 0.195-mm2 area and consumes a relatively low 17.9 mW at 6 Gb/s at 1.0V. Therefore, the power efficiency is 2.98 mW/Gb/s. The simulated jitter is about 0.034 UI RMS given an input jitter value of 0.03 UI RMS, while the relatively constant loop bandwidth with the PD linearization technique is about 7.3-MHz regardless of the data-stream noise. Unlike the first receiver, the second proposed multichannel receiver was designed to reduce the hardware complexity of each lane. The receiver employs shared calibration logic among channels and yet achieves superior channel expandability with slim data lanes. A shared global calibration control, which is used in a forwarded clock receiver based on a multiphase delay-locked loop, accomplishes skew calibration, equalizer adaptation, and the phase lock of all channels during a calibration period, resulting in reduced hardware overhead and less area required by each data lane. The second forwarded clock receiver is designed in 90-nm CMOS technology. It achieves error-free eye openings of more than 0.5 UI across 9โˆ’ 28 inch Nelco 4000-6 microstrips at 4โˆ’ 7 Gb/s and more than 0.42 UI at data rates of up to 9 Gb/s. The data lane occupies only 0.152 mm2 and consumes 69.8 mW, while the rest of the receiver occupies 0.297 mm2 and consumes 56 mW at a data rate of 7 Gb/s and a supply voltage of 1.35 V.1. Introduction 1 1.1 Motivations 1.2 Thesis Organization 2. Previous Receivers for Serial-Data Communications 2.1 Classification of the Links 2.2 Clocking architecture of transceivers 2.3 Components of receiver 2.3.1 Channel loss 2.3.2 Equalizer 2.3.3 Clock and data recovery circuit 2.3.3.1. Basic architecture 2.3.3.2. Phase detector 2.3.3.2.1. Linear phase detector 2.3.3.2.2. Binary phase detector 2.3.3.3. Frequency detector 2.3.3.4. Charge pump 2.3.3.5. Voltage controlled oscillator and delay-line 2.3.4 Loop dynamics of PLL 2.3.5 Loop dynamics of DLL 3. The Proposed PLL-Based Receiver with Loop Linearization Technique 3.1 Introduction 3.2 Motivation 3.3 Overview of binary phase detection 3.4 The proposed BBPD linearization technique 3.4.1 Architecture of the proposed PLL-based receiver 3.4.2 Linearization technique of binary phase detection 3.4.3 Rotational pattern of sampling phase offset 3.5 PD gain analysis and optimization 3.6 Loop Dynamics of the 2nd-order CDR 3.7 Verification with the time-accurate behavioral simulation 3.8 Summary 4. The Proposed DLL-Based Receiver with Forwarded-Clock 4.1 Introduction 4.2 Motivation 4.3 Design consideration 4.4 Architecture of the proposed forwarded-clock receiver 4.5 Circuit description 4.5.1 Analog multi-phase DLL 4.5.2 Dual-input interpolating deley cells 4.5.3 Dedicated half-rate data samplers 4.5.4 Cherry-Hooper continuous-time linear equalizer 4.5.5 Equalizer adaptation and phase-lock scheme 4.6 Measurement results 5. Conclusion 6. BibliographyDocto

    Design of energy-efficient high-speed wireline transceiver

    Get PDF
    Energy efficiency has become the most important performance metric of integrated circuits used in many applications ranging from mobile devices to high-performance processors. The power problem permeates both computing and communication systems alike. Especially in the era of Big Data, continuously growing demand for higher communication bandwidth is driving the need for energy-efficient high-speed I/O serial links. However, the rate at which the energy efficiency of serial links is improving is much slower than the rate at which the required data transfer bandwidth is increasing. This dissertation explores two design approaches for energy-efficient communication systems. The first design approach maximizes the energy efficiency of a transceiver without any performance loss, and as a prototype, a source-synchronous multi-Gb/s transceiver that achieves excellent energy efficiency lower than 0.3pJ/bit is presented. To this end, the proposed transceiver employs aggressive supply voltage scaling, and multiplexed transmitter and receiver synchronized by low-rate multi-phase clocks are adopted to achieve high data rate even at a supply voltage close to the device threshold voltage. Phase spacing errors resulting from device mismatches are corrected using a self-calibration scheme. The proposed phase calibration method uses a single digital delay-locked loop (DLL) for calibrating all the phases, which makes the calibration process insensitive to the supply voltage level. Thanks to this technique, the proposed multi-Gb/s transceiver operates robustly and energy-efficiently at a very low supply voltage. Fabricated in a 65nm CMOS process, the energy efficiency and data rate of the prototype transceiver vary from 0.29pJ/bit to 0.58pJ/bit and 1Gb/s to 6Gb/s, respectively, as the supply voltage is varied from 0.45V to 0.7V. In the second approach, observing that the data traffic in a real system is bursty, a full-rate burst-mode transceiver that achieves rapid on/off operation needed for energy-proportional systems is presented. By injecting input data edges into the oscillator embedded in a classical type-II digital clock and data recovery (CDR) circuit, the proposed receiver achieves instantaneous phase-locking and input jitter filtering simultaneously. In other words, the proposed CDR combines the advantages of conventional feed-forward and feedback architectures to achieve energy-proportional operation. By controlling the number of data edges injected into the oscillator, both the jitter transfer bandwidth and the jitter tolerance corner are accurately controlled. The feedback loop also corrects for any frequency error and helps improve the CDR's immunity to oscillator frequency drift during the power-on and -off states. This also improves the CDR's tolerance to consecutive identical digits present in the input data. Fabricated in a 90nm CMOS process, the prototype receiver instantaneously locks onto the very first data edge and consumes 6.1mW at 2.2Gb/s. Owing to its short power-on time, the overall transceiver's energy efficiency varies only from 5.4pJ/bit to 10.7pJ/bit when the effective data rate is varied from 2.2Gb/s to 0.22Gb/s
    corecore