110 research outputs found

    Toward realizing power scalable and energy proportional high-speed wireline links

    Get PDF
    Growing computational demand and proliferation of cloud computing has placed high-speed serial links at the center stage. Due to saturating energy efficiency improvements over the last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as output drivers, receiver, or clock generation and distribution. However, this approach yields very limited efficiency improvement. This dissertation takes an alternative approach toward reducing the serial link power. Instead of optimizing the power of individual building blocks, power of the entire serial link is reduced by exploiting serial link usage by the applications. It has been demonstrated that serial links in servers are underutilized. On average, they are used only 15% of the time, i.e. these links are idle for approximately 85% of the time. Conventional links consume power during idle periods to maintain synchronization between the transmitter and the receiver. However, by powering-off the link when idle and powering it back when needed, power consumption of the serial link can be scaled proportionally to its utilization. This approach of rapid power state transitioning is known as the rapid-on/off approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power, and power state transition energy must all be close to zero. However, in practice, it is very difficult to achieve these ideal conditions. Work presented in this dissertation addresses these challenges. When this research work was started (2011-12), there were only a couple of research papers available in the area of rapid-on/off links. Systematic study or design of a rapid power state transitioning in serial links was not available in the literature. Since rapid-on/off with nanoseconds granularity is not a standard in any wireline communication, even the popular test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However, these challenges provided a unique opportunity to explore new architectural techniques and identify trade-offs. The key contributions of this dissertation are as follows. The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to find alternative ways to reduce the serial link power. The second contribution is to identify potential power saving techniques and evaluate the challenges they pose and the opportunities they present. The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature. The transmitter achieves rapid-on/off capability in voltage mode output driver by using a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and periodic reference insertion. To ease timing requirements, an improved edge replacement logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time as a function of various circuit parameters is also discussed. The proposed transmitter demonstrates energy proportional operation over wide variations of link utilization, and is, therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns, respectively. This dissertation highlights key trade-off in the clock multiplier architecture, to achieve fast power-on-lock capability at the cost of jitter performance. The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi- plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita- tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves power-on-lock in 1ns. The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit- ter and receiver. It was the first reported design of a complete transceiver, with an embedded clock architecture, having rapid-on/off capability. Background phase calibration technique in PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on. The proposed transceiver demonstrates power scalability with a wide range of link utiliza- tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes by 100x (7Gb/s-to-70Mb/s). The sixth and final contribution is the design of a temperature sensor to compensate the frequency drifts due to temperature variations, during long power-off periods, in the fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor is designed with all digital logic gates and achieves low supply sensitivity. This sensor is suitable for integration in processor and DRAM environments. The proposed sensor works on the principle of directly converting temperature information to frequency and finally to digital bits. A novel sensing technique is proposed in which temperature information is acquired by creating a threshold voltage difference between the transistors used in the oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and the overhead of voltage regulators and an external ideal reference frequency is avoided. The effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of ±0.9oC and ±2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 , and measurement (conversion) time of 6.5Όs

    Analysis and Design of Energy Efficient Frequency Synthesizers for Wireless Integrated Systems

    Full text link
    Advances in ultra-low power (ULP) circuit technologies are expanding the IoT applications in our daily life. However, wireless connectivity, small form factor and long lifetime are still the key constraints for many envisioned wearable, implantable and maintenance-free monitoring systems to be practically deployed at a large scale. The frequency synthesizer is one of the most power hungry and complicated blocks that not only constraints RF performance but also offers subtle scalability with power as well. Furthermore, the only indispensable off-chip component, the crystal oscillator, is also associated with the frequency synthesizer as a reference. This thesis addresses the above issues by analyzing how phase noise of the LO affect the frequency modulated wireless system in different aspects and how different noise sources in the PLL affect the performance. Several chip prototypes have been demonstrated including: 1) An ULP FSK transmitter with SAR assisted FLL; 2) A ring oscillator based all-digital BLE transmitter utilizing a quarter RF frequency LO and 4X frequency multiplier; and 3) An XO-less BLE transmitter with an RF reference recovery receiver. The first 2 designs deal with noise sources in the PLL loop for ultimate power and cost reduction, while the third design deals with the reference noise outside the PLL and explores a way to replace the XO in ULP wireless edge nodes. And at last, a comprehensive PN theory is proposed as the design guideline.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153420/1/chenxing_1.pd

    Energy-efficient wireline transceivers

    Get PDF
    Power-efficient wireline transceivers are highly demanded by many applications in high performance computation and communication systems. Apart from transferring a wide range of data rates to satisfy the interconnect bandwidth requirement, the transceivers have very tight power budget and are expected to be fully integrated. This thesis explores enabling techniques to implement such transceivers in both circuit and system levels. Specifically, three prototypes will be presented: (1) a 5Gb/s reference-less clock and data recovery circuit (CDR) using phase-rotating phase-locked loop (PRPLL) to conduct phase control so as to break several fundamental trade-offs in conventional receivers; (2) a 4-10.5Gb/s continuous-rate CDR with novel frequency acquisition scheme based on bang-bang phase detector (BBPD) and a ring oscillator-based fractional-N PLL as the low noise wide range DCO in the CDR loop; (3) a source-synchronous energy-proportional link with dynamic voltage and frequency scaling (DVFS) and rapid on/off (ROO) techniques to cut the link power wastage at system level. The receiver/transceiver architectures are highly digital and address the requirements of new receiver architecture development, wide operating range, and low power/area consumption while being fully integrated. Experimental results obtained from the prototypes attest the effectiveness of the proposed techniques

    Adaptive Receiver Design for High Speed Optical Communication

    Get PDF
    Conventional input/output (IO) links consume power, independent of changes in the bandwidth demand by the system they are deployed in. As the system is designed to satisfy the peak bandwidth demand, most of the time the IO links are idle but still consuming power. In big data centers, the overall utilization ratio of IO links is less than 10%, corresponding to a large amount of energy wasted for idle operation. This work demonstrates a 60 Gb/s high sensitivity non-return-to-zero (NRZ) optical receiver in 14 nm FinFET technology with less than 7 ns power-on time. The power on time includes the data detection, analog bias settling, photo-diode DC current cancellation, and phase locking by the clock and data recovery circuit (CDR). The receiver autonomously detects the data demand on the link via a proposed link protocol and does not require any external enable or disable signals. The proposed link protocol is designed to minimize the off-state power consumption and power-on time of the link. In order to achieve high data-rate and high-sensitivity while maintaining the power budget, a 1-tap decision feedback equalization method is applied in digital domain. The sensitivity is measured to be -8 dBm, -11 dBm, and -13 dBm OMA (optical modulation amplitude) at 60 Gb/s, 48 Gb/s, and 32 Gb/s data rates, respectively. The energy efficiency in always-on mode is around 2.2 pJ/bit for all data-rates with the help of supply and bias scaling. The receiver incorporates a phase interpolator based clock-and-data recovery circuit with approximately 80 MHz jitter-tolerance corner frequency, thanks to the low-latency full custom CDR logic design. This work demonstrates the fastest ever reported CMOS optical receiver and runs almost at twice the data-rate of the state-of-the-art CMOS optical receiver by the time of the publication. The data-rate is comparable to BiCMOS optical receivers but at a fraction of the power consumption

    Silicon-Based Terahertz Circuits and Systems

    Get PDF
    The Terahertz frequency range, often referred to as the `Terahertz' gap, lies wedged between microwave at the lower end and infrared at the higher end of the spectrum, occupying frequencies between 0.3-3.0 THz. For a long time, applications in THz frequencies had been limited to astronomy and chemical sciences, but with advancement in THz technology in recent years, it has shown great promise in a wide range of applications ranging from disease diagnostics, non-invasive early skin cancer detection, label-free DNA sequencing to security screening for concealed weapons and contraband detection, global environmental monitoring, nondestructive quality control and ultra-fast wireless communication. Up until recently, the terahertz frequency range has been mostly addressed by high mobility compound III-V processes, expensive nonlinear optics, or cryogenically cooled quantum cascade lasers. A low cost, room temperature alternative can enable the development of such a wide array of applications, not currently accessible due to cost and size limitations. In this thesis, we will discuss our approach towards development of integrated terahertz technology in silicon-based processes. In the spirit of academic research, we will address frequencies close to 0.3 THz as 'Terahertz'. In this thesis, we address both fronts of integrated THz systems in silicon: THz power generation, radiation and transmitter systems, and THz signal detection and receiver systems. THz power generation in silicon-based integrated circuit technology is challenging due to lower carrier mobility, lower cut-o frequencies compared to compound III-V processes, lower breakdown voltages and lossy passives. Radiation from silicon chip is also challenging due to lossy substrates and high dielectric constant of silicon. In this work, we propose novel ways of combining circuit and electromagnetic techniques in a holistic design approach, which can overcome limitations of conventional block-by-block or partitioned design methodology, in order to generate high-frequency signals above the classical definition of cut-off frequencies (ƒt/ƒmax). We demonstrate this design philosophy in an active electromagnetic structure, which we call Distributed Active Radiator. It is inspired by an Inverse Maxwellian approach, where instead of using classical circuit and electromagnetic blocks to generate and radiate THz frequencies, we formulate surface (metal) currents in silicon chip for a desired THz field prole and develop active means of controlling different harmonic currents to perform signal generation, frequency multiplication, radiation and lossless filtering, simultaneously in a compact footprint. By removing the articial boundaries between circuits, electromagnetics and antenna, we open ourselves to a broader design space. This enabled us to demonstrate the rst 1 mW Eective-isotropic-radiated-power(EIRP) THz (0.29 THz) source in CMOS with total radiated power being three orders of magnitude more than previously demonstrated. We also proposed a near-field synchronization mechanism, which is a scalable method of realizing large arrays of synchronized autonomous radiating sources in silicon. We also demonstrate the first THz CMOS array with digitally controlled beam-scanning in 2D space with radiated output EIRP of nearly 10 mW at 0.28 THz. On the receiver side, we use a similar electronics and electromagnetics co-design approach to realize a 4x4 pixel integrated silicon Terahertz camera demonstrating to the best of our knowledge, the most sensitive silicon THz detector array without using post-processing, silicon lens or high-resistivity substrate options (NEP &lt; 10 pW &#8730; Hz at 0.26 THz). We put the 16 pixel silicon THz camera together with the CMOS DAR THz power generation arrays and demonstrated, for the first time, an all silicon THz imaging system with a CMOS source.</p

    Design of clock and data recovery circuits for energy-efficient short-reach optical transceivers

    Get PDF
    Nowadays, the increasing demand for cloud based computing and social media services mandates higher throughput (at least 56 Gb/s per data lane with 400 Gb/s total capacity 1) for short reach optical links (with the reach typically less than 2 km) inside data centres. The immediate consequences are the huge and power hungry data centers. To address these issues the intra-data-center connectivity by means of optical links needs continuous upgrading. In recent years, the trend in the industry has shifted toward the use of more complex modulation formats like PAM4 due to its spectral efficiency over the traditional NRZ. Another advantage is the reduced number of channels count which is more cost-effective considering the required area and the I/O density. However employing PAM4 results in more complex transceivers circuitry due to the presence of multilevel transitions and reduced noise budget. In addition, providing higher speed while accommodating the stringent requirements of higher density and energy efficiency (< 5 pJ/bit), makes the design of the optical links more challenging and requires innovative design techniques both at the system and circuit level. This work presents the design of a Clock and Data Recovery Circuit (CDR) as one of the key building blocks for the transceiver modules used in such fibreoptic links. Capable of working with PAM4 signalling format, the new proposed CDR architecture targets data rates of 50−56 Gb/s while achieving the required energy efficiency (< 5 pJ/bit). At the system level, the design proposes a new PAM4 PD which provides a better trade-off in terms of bandwidth and systematic jitter generation in the CDR. By using a digital loop controller (DLC), the CDR gains considerable area reduction with flexibility to adjust the loop dynamics. At the circuit level it focuses on applying different circuit techniques to mitigate the circuit imperfections. It presents a wideband analog front end (AFE), suitable for a 56 Gb/s, 28-Gbaud PAM-4 signal, by using an 8x interleaved, master/ slave based sample and hold circuit. In addition, the AFE is equipped with a calibration scheme which corrects the errors associated with the sampling channels’ offset voltage and gain mismatches. The presented digital to phase converter (DPC) features a modified phase interpolator (PI), a new quadrature phase corrector (QPC) and multi-phase output with de-skewing capabilities.The DPC (as a standalone block) and the CDR (as the main focus of this work) were fabricated in 65-nm CMOS technology. Based on the measurements, the DPC achieves DNL/INL of 0.7/6 LSB respectively while consuming 40.5 mW power from 1.05 V supply. Although the CDR was not fully operational with the PAM4 input, the results from 25-Gbaud PAM2 (NRZ) test setup were used to estimate the performance. Under this scenario, the 1-UI JTOL bandwidth was measured to be 2 MHz with BER threshold of 10−4. The chip consumes 236 mW of power while operating on 1 − 1.2 V supply range achieving an energyefficiency of 4.27 pJ/bit

    Digital-based analog processing in nanoscale CMOS ICs for IoT applications

    Get PDF
    The Internet-of-Things (IoT) concept has been opening up a variety of applications, such as urban and environmental monitoring, smart health, surveillance, and home automation. Most of these IoT applications require more and more power/area efficient Complemen tary Metal–Oxide–Semiconductor (CMOS) systems and faster prototypes (lower time-to market), demanding special modifications in the current IoT design system bottleneck: the analog/RF interfaces. Specially after the 2000s, it is evident that there have been significant improvements in CMOS digital circuits when compared to analog building blocks. Digital circuits have been taking advantage of CMOS technology scaling in terms of speed, power consump tion, and cost, while the techniques running behind the analog signal processing are still lagging. To decrease this historical gap, there has been an increasing trend in finding alternative IC design strategies to implement typical analog functions exploiting Digital in-Concept Design Methodologies (DCDM). This idea of re-thinking analog functions in digital terms has shown that Analog ICs blocks can also avail of the feature-size shrinking and energy efficiency of new technologies. This thesis deals with the development of DCDM, demonstrating its compatibility for Ultra-Low-Voltage (ULV) and Power (ULP) IoT applications. This work proves this state ment through the proposing of new digital-based analog blocks, such as an Operational Transconductance Amplifiers (OTAs) and an ac-coupled Bio-signal Amplifier (BioAmp). As an initial contribution, for the first time, a silicon demonstration of an embryonic Digital-Based OTA (DB-OTA) published in 2013 is exhibited. The fabricated DB-OTA test chip occupies a compact area of 1,426 ”m2 , operating at supply voltages (VDD) down to 300 mV, consuming only 590 pW while driving a capacitive load of 80pF. With a Total Harmonic Distortion (THD) lower than 5% for a 100mV input signal swing, its measured small-signal figure of merit (FOMS) and large-signal figure of merit (FOML) are 2,101 V −1 and 1,070, respectively. To the best of this thesis author’s knowledge, this measured power is the lowest reported to date in OTA literature, and its figures of merit are the best in sub-500mV OTAs reported to date. As the second step, mainly due to the robustness limitation of previous DB-OTA, a novel calibration-free digital-based topology is proposed, named here as Digital OTA (DIG OTA). A 180-nm DIGOTA test chip is also developed exhibiting an area below the 1000 ”m2 wall, 2.4nW power under 150pF load, and a minimum VDD of 0.25 V. The proposed DIGOTA is more digital-like compared with DB-OTA since no pseudo-resistor is needed. As the last contribution, the previously proposed DIGOTA is then used as a building block to demonstrate the operation principle of power-efficient ULV and ultra-low area (ULA) fully-differential, digital-based Operational Transconductance Amplifier (OTA), suitable for microscale biosensing applications (BioDIGOTA) such as extreme low area Body Dust. Measured results in 180nm CMOS confirm that the proposed BioDIGOTA can work with a supply voltage down to 400 mV, consuming only 95 nW. The BioDIGOTA layout occupies only 0.022 mm2 of total silicon area, lowering the area by 3.22X times compared to the current state of the art while keeping reasonable system performance, such as 7.6 Noise Efficiency Factor (NEF) with 1.25 ”VRMS input-referred noise over a 10 Hz bandwidth, 1.8% of THD, 62 dB of the common-mode rejection ratio (CMRR) and 55 dB of power supply rejection ratio (PSRR). After reviewing the current DCDM trend and all proposed silicon demonstrations, the thesis concludes that, despite the current analog design strategies involved during the analog block development
    • 

    corecore