110 research outputs found
Toward realizing power scalable and energy proportional high-speed wireline links
Growing computational demand and proliferation of cloud computing has placed high-speed
serial links at the center stage. Due to saturating energy efficiency improvements over the
last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as
output drivers, receiver, or clock generation and distribution. However, this approach yields
very limited efficiency improvement. This dissertation takes an alternative approach toward
reducing the serial link power. Instead of optimizing the power of individual building blocks,
power of the entire serial link is reduced by exploiting serial link usage by the applications.
It has been demonstrated that serial links in servers are underutilized. On average, they
are used only 15% of the time, i.e. these links are idle for approximately 85% of the time.
Conventional links consume power during idle periods to maintain synchronization between
the transmitter and the receiver. However, by powering-off the link when idle and powering
it back when needed, power consumption of the serial link can be scaled proportionally to
its utilization. This approach of rapid power state transitioning is known as the rapid-on/off
approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power,
and power state transition energy must all be close to zero. However, in practice, it is very
difficult to achieve these ideal conditions. Work presented in this dissertation addresses these
challenges.
When this research work was started (2011-12), there were only a couple of research papers
available in the area of rapid-on/off links. Systematic study or design of a rapid power state
transitioning in serial links was not available in the literature. Since rapid-on/off with
nanoseconds granularity is not a standard in any wireline communication, even the popular
test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However,
these challenges provided a unique opportunity to explore new architectural techniques and
identify trade-offs. The key contributions of this dissertation are as follows.
The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to
find alternative ways to reduce the serial link power.
The second contribution is to identify potential power saving techniques and evaluate the
challenges they pose and the opportunities they present.
The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature.
The transmitter achieves rapid-on/off capability in voltage mode output driver by using
a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and
periodic reference insertion. To ease timing requirements, an improved edge replacement
logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time
as a function of various circuit parameters is also discussed. The proposed transmitter
demonstrates energy proportional operation over wide variations of link utilization, and is,
therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the
voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns,
respectively. This dissertation highlights key trade-off in the clock multiplier architecture,
to achieve fast power-on-lock capability at the cost of jitter performance.
The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi-
plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita-
tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves
power-on-lock in 1ns.
The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded
clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit-
ter and receiver. It was the first reported design of a complete transceiver, with an embedded
clock architecture, having rapid-on/off capability. Background phase calibration technique in
PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on.
The proposed transceiver demonstrates power scalability with a wide range of link utiliza-
tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver
achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by
only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes
by 100x (7Gb/s-to-70Mb/s).
The sixth and final contribution is the design of a temperature sensor to compensate
the frequency drifts due to temperature variations, during long power-off periods, in the
fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor
is designed with all digital logic gates and achieves low supply sensitivity. This sensor is
suitable for integration in processor and DRAM environments. The proposed sensor works
on the principle of directly converting temperature information to frequency and finally
to digital bits. A novel sensing technique is proposed in which temperature information
is acquired by creating a threshold voltage difference between the transistors used in the
oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and
the overhead of voltage regulators and an external ideal reference frequency is avoided. The
effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated
in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V
to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of ±0.9oC
and ±2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity
correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 ,
and measurement (conversion) time of 6.5ÎŒs
Analysis and Design of Energy Efficient Frequency Synthesizers for Wireless Integrated Systems
Advances in ultra-low power (ULP) circuit technologies are expanding the IoT applications in our daily life. However, wireless connectivity, small form factor and long lifetime are still the key constraints for many envisioned wearable, implantable and maintenance-free monitoring systems to be practically deployed at a large scale. The frequency synthesizer is one of the most power hungry and complicated blocks that not only constraints RF performance but also offers subtle scalability with power as well. Furthermore, the only indispensable off-chip component, the crystal oscillator, is also associated with the frequency synthesizer as a reference.
This thesis addresses the above issues by analyzing how phase noise of the LO affect the frequency modulated wireless system in different aspects and how different noise sources in the PLL affect the performance. Several chip prototypes have been demonstrated including: 1) An ULP FSK transmitter with SAR assisted FLL; 2) A ring oscillator based all-digital BLE transmitter utilizing a quarter RF frequency LO and 4X frequency multiplier; and 3) An XO-less BLE transmitter with an RF reference recovery receiver. The first 2 designs deal with noise sources in the PLL loop for ultimate power and cost reduction, while the third design deals with the reference noise outside the PLL and explores a way to replace the XO in ULP wireless edge nodes. And at last, a comprehensive PN theory is proposed as the design guideline.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153420/1/chenxing_1.pd
Energy-efficient wireline transceivers
Power-efficient wireline transceivers are highly demanded by many applications in high performance computation and communication systems. Apart from transferring a wide range of data rates to satisfy the interconnect bandwidth requirement, the transceivers have very tight power budget and are expected to be fully integrated. This thesis explores enabling techniques to implement such transceivers in both circuit and system levels. Specifically, three prototypes will be presented: (1) a 5Gb/s reference-less clock and data recovery circuit (CDR) using phase-rotating phase-locked loop (PRPLL) to conduct phase control so as to break several fundamental trade-offs in conventional receivers; (2) a 4-10.5Gb/s continuous-rate CDR with novel frequency acquisition scheme based on bang-bang phase detector (BBPD) and a ring oscillator-based fractional-N PLL as the low noise wide range DCO in the CDR loop; (3) a source-synchronous energy-proportional link with dynamic voltage and frequency scaling (DVFS) and rapid on/off (ROO) techniques to cut the link power wastage at system level. The receiver/transceiver architectures are highly digital and address the requirements of new receiver architecture development, wide operating range, and low power/area consumption while being fully integrated. Experimental results obtained from the prototypes attest the effectiveness of the proposed techniques
Adaptive Receiver Design for High Speed Optical Communication
Conventional input/output (IO) links consume power, independent of changes
in the bandwidth demand by the system they are deployed in. As the system is
designed to satisfy the peak bandwidth demand, most of the time the IO links
are idle but still consuming power. In big data centers, the overall utilization
ratio of IO links is less than 10%, corresponding to a large amount of energy
wasted for idle operation.
This work demonstrates a 60 Gb/s high sensitivity non-return-to-zero (NRZ)
optical receiver in 14 nm FinFET technology with less than 7 ns power-on time.
The power on time includes the data detection, analog bias settling, photo-diode
DC current cancellation, and phase locking by the clock and data recovery circuit
(CDR). The receiver autonomously detects the data demand on the link
via a proposed link protocol and does not require any external enable or disable
signals. The proposed link protocol is designed to minimize the off-state power
consumption and power-on time of the link.
In order to achieve high data-rate and high-sensitivity while maintaining
the power budget, a 1-tap decision feedback equalization method is applied in
digital domain. The sensitivity is measured to be -8 dBm, -11 dBm, and -13 dBm
OMA (optical modulation amplitude) at 60 Gb/s, 48 Gb/s, and 32 Gb/s data rates,
respectively. The energy efficiency in always-on mode is around 2.2 pJ/bit for all
data-rates with the help of supply and bias scaling.
The receiver incorporates a phase interpolator based clock-and-data recovery
circuit with approximately 80 MHz jitter-tolerance corner frequency, thanks to
the low-latency full custom CDR logic design.
This work demonstrates the fastest ever reported CMOS optical receiver and
runs almost at twice the data-rate of the state-of-the-art CMOS optical receiver
by the time of the publication. The data-rate is comparable to BiCMOS optical
receivers but at a fraction of the power consumption
Silicon-Based Terahertz Circuits and Systems
The Terahertz frequency range, often referred to as the `Terahertz' gap, lies wedged between microwave at the lower end and infrared at the higher end of the spectrum, occupying frequencies between 0.3-3.0 THz. For a long time, applications in THz frequencies had been limited to astronomy and chemical sciences, but with advancement in THz technology in recent years, it has shown great promise in a wide range of applications ranging from disease diagnostics, non-invasive early skin cancer detection, label-free DNA sequencing to security screening for concealed weapons and contraband detection, global environmental monitoring, nondestructive quality control and ultra-fast wireless communication. Up until recently, the terahertz frequency range has been mostly addressed by high mobility compound III-V processes, expensive nonlinear optics, or cryogenically cooled quantum cascade lasers. A low cost, room temperature alternative can enable the development of such a wide array of applications, not currently accessible due to cost and size limitations. In this thesis, we will discuss our approach towards development of integrated terahertz technology in silicon-based processes. In the spirit of academic research, we will address frequencies close to 0.3 THz as 'Terahertz'.
In this thesis, we address both fronts of integrated THz systems in silicon: THz power generation, radiation and transmitter systems, and THz signal detection and receiver systems. THz power generation in silicon-based integrated circuit technology is challenging due to lower carrier mobility, lower cut-o frequencies compared to compound III-V processes, lower breakdown voltages and lossy passives. Radiation from silicon chip is also challenging due to lossy substrates and high dielectric constant of silicon. In this work, we propose novel ways of combining circuit and electromagnetic techniques in a holistic design approach, which can overcome limitations of conventional block-by-block or partitioned design methodology, in order to generate high-frequency signals above the classical definition of cut-off frequencies (Æt/Æmax). We demonstrate this design philosophy in an active electromagnetic structure, which we call Distributed Active Radiator. It is inspired by an Inverse
Maxwellian approach, where instead of using classical circuit and electromagnetic blocks to generate and radiate THz frequencies, we formulate surface (metal) currents in silicon chip for a desired THz field prole and develop active means of controlling different harmonic
currents to perform signal generation, frequency multiplication, radiation and lossless filtering, simultaneously in a compact footprint. By removing the articial boundaries between circuits, electromagnetics and antenna, we open ourselves to a broader design space. This
enabled us to demonstrate the rst 1 mW Eective-isotropic-radiated-power(EIRP) THz (0.29 THz) source in CMOS with total radiated power being three orders of magnitude more than previously demonstrated. We also proposed a near-field synchronization mechanism, which is a scalable method of realizing large arrays of synchronized autonomous radiating sources in silicon. We also demonstrate the first THz CMOS array with digitally controlled beam-scanning in 2D space with radiated output EIRP of nearly 10 mW at 0.28 THz.
On the receiver side, we use a similar electronics and electromagnetics co-design approach to realize a 4x4 pixel integrated silicon Terahertz camera demonstrating to the best of our knowledge, the most sensitive silicon THz detector array without using post-processing,
silicon lens or high-resistivity substrate options (NEP < 10 pW √ Hz at 0.26 THz). We put the 16 pixel silicon THz camera together with the CMOS DAR THz power generation arrays and demonstrated, for the first time, an all silicon THz imaging system with a CMOS source.</p
Design of clock and data recovery circuits for energy-efficient short-reach optical transceivers
Nowadays, the increasing demand for cloud based computing and social media
services mandates higher throughput (at least 56 Gb/s per data lane with 400
Gb/s total capacity 1) for short reach optical links (with the reach typically less
than 2 km) inside data centres. The immediate consequences are the huge
and power hungry data centers. To address these issues the intra-data-center
connectivity by means of optical links needs continuous upgrading.
In recent years, the trend in the industry has shifted toward the use of more
complex modulation formats like PAM4 due to its spectral efficiency over the
traditional NRZ. Another advantage is the reduced number of channels count
which is more cost-effective considering the required area and the I/O density.
However employing PAM4 results in more complex transceivers circuitry due
to the presence of multilevel transitions and reduced noise budget. In addition,
providing higher speed while accommodating the stringent requirements
of higher density and energy efficiency (< 5 pJ/bit), makes the design of the
optical links more challenging and requires innovative design techniques both
at the system and circuit level.
This work presents the design of a Clock and Data Recovery Circuit (CDR) as
one of the key building blocks for the transceiver modules used in such fibreoptic
links. Capable of working with PAM4 signalling format, the new proposed
CDR architecture targets data rates of 50â56 Gb/s while achieving the required
energy efficiency (< 5 pJ/bit).
At the system level, the design proposes a new PAM4 PD which provides a better
trade-off in terms of bandwidth and systematic jitter generation in the CDR. By
using a digital loop controller (DLC), the CDR gains considerable area reduction
with flexibility to adjust the loop dynamics.
At the circuit level it focuses on applying different circuit techniques to mitigate
the circuit imperfections. It presents a wideband analog front end (AFE),
suitable for a 56 Gb/s, 28-Gbaud PAM-4 signal, by using an 8x interleaved, master/
slave based sample and hold circuit. In addition, the AFE is equipped with
a calibration scheme which corrects the errors associated with the sampling
channelsâ offset voltage and gain mismatches. The presented digital to phase
converter (DPC) features a modified phase interpolator (PI), a new quadrature
phase corrector (QPC) and multi-phase output with de-skewing capabilities.The DPC (as a standalone block) and the CDR (as the main focus of this work)
were fabricated in 65-nm CMOS technology. Based on the measurements, the
DPC achieves DNL/INL of 0.7/6 LSB respectively while consuming 40.5 mW
power from 1.05 V supply. Although the CDR was not fully operational with
the PAM4 input, the results from 25-Gbaud PAM2 (NRZ) test setup were used
to estimate the performance. Under this scenario, the 1-UI JTOL bandwidth
was measured to be 2 MHz with BER threshold of 10â4. The chip consumes 236
mW of power while operating on 1 â 1.2 V supply range achieving an energyefficiency
of 4.27 pJ/bit
Digital-based analog processing in nanoscale CMOS ICs for IoT applications
The Internet-of-Things (IoT) concept has been opening up a variety of applications, such as urban and environmental monitoring, smart health, surveillance, and home automation. Most of these IoT applications require more and more power/area efficient Complemen tary MetalâOxideâSemiconductor (CMOS) systems and faster prototypes (lower time-to market), demanding special modifications in the current IoT design system bottleneck: the analog/RF interfaces. Specially after the 2000s, it is evident that there have been significant improvements in CMOS digital circuits when compared to analog building blocks. Digital circuits have been taking advantage of CMOS technology scaling in terms of speed, power consump tion, and cost, while the techniques running behind the analog signal processing are still lagging. To decrease this historical gap, there has been an increasing trend in finding alternative IC design strategies to implement typical analog functions exploiting Digital in-Concept Design Methodologies (DCDM). This idea of re-thinking analog functions in digital terms has shown that Analog ICs blocks can also avail of the feature-size shrinking and energy efficiency of new technologies. This thesis deals with the development of DCDM, demonstrating its compatibility for Ultra-Low-Voltage (ULV) and Power (ULP) IoT applications. This work proves this state ment through the proposing of new digital-based analog blocks, such as an Operational Transconductance Amplifiers (OTAs) and an ac-coupled Bio-signal Amplifier (BioAmp). As an initial contribution, for the first time, a silicon demonstration of an embryonic Digital-Based OTA (DB-OTA) published in 2013 is exhibited. The fabricated DB-OTA test chip occupies a compact area of 1,426 ”m2 , operating at supply voltages (VDD) down to 300 mV, consuming only 590 pW while driving a capacitive load of 80pF. With a Total Harmonic Distortion (THD) lower than 5% for a 100mV input signal swing, its measured small-signal figure of merit (FOMS) and large-signal figure of merit (FOML) are 2,101 V â1 and 1,070, respectively. To the best of this thesis authorâs knowledge, this measured power is the lowest reported to date in OTA literature, and its figures of merit are the best in sub-500mV OTAs reported to date. As the second step, mainly due to the robustness limitation of previous DB-OTA, a novel calibration-free digital-based topology is proposed, named here as Digital OTA (DIG OTA). A 180-nm DIGOTA test chip is also developed exhibiting an area below the 1000 ”m2 wall, 2.4nW power under 150pF load, and a minimum VDD of 0.25 V. The proposed DIGOTA is more digital-like compared with DB-OTA since no pseudo-resistor is needed. As the last contribution, the previously proposed DIGOTA is then used as a building block to demonstrate the operation principle of power-efficient ULV and ultra-low area (ULA) fully-differential, digital-based Operational Transconductance Amplifier (OTA), suitable for microscale biosensing applications (BioDIGOTA) such as extreme low area Body Dust. Measured results in 180nm CMOS confirm that the proposed BioDIGOTA can work with a supply voltage down to 400 mV, consuming only 95 nW. The BioDIGOTA layout occupies only 0.022 mm2 of total silicon area, lowering the area by 3.22X times compared to the current state of the art while keeping reasonable system performance, such as 7.6 Noise Efficiency Factor (NEF) with 1.25 ”VRMS input-referred noise over a 10 Hz bandwidth, 1.8% of THD, 62 dB of the common-mode rejection ratio (CMRR) and 55 dB of power supply rejection ratio (PSRR). After reviewing the current DCDM trend and all proposed silicon demonstrations, the thesis concludes that, despite the current analog design strategies involved during the analog block development
- âŠ