9 research outputs found
Recommended from our members
Design of Power-Efficient Optical Transceivers and Design of High-Linearity Wireless Wideband Receivers
The combination of silicon photonics and advanced heterogeneous integration is promising for next-generation disaggregated data centers that demand large scale, high throughput, and low power. In this dissertation, we discuss the design and theory of power-efficient optical transceivers with System-in-Package (SiP) 2.5D integration. Combining prior arts and proposed circuit techniques, a receiver chip and a transmitter chip including two 10 Gb/s data channels and one 2.5 GHz clocking channel are designed and implemented in 28 nm CMOS technology.
An innovative transimpedance amplifier (TIA) and a single-ended to differential (S2D) converter are proposed and analyzed for a low-voltage high-sensitivity receiver; a four-to-one serializer, programmable output drivers, AC coupling units, and custom pads are implemented in a low-power transmitter; an improved quadrature locked loop (QLL) is employed to generate accurate quadrature clocks. In addition, we present an analysis for inverter-based shunt-feedback TIA to explicitly depict the trade-off among sensitivity, data rate, and power consumption. At last, the research on CDR-based clocking schemes for optical links is also discussed. We introduce prior arts and propose a power-efficient clocking scheme based on an injection-locked phase rotator. Next, we analyze injection-locked ring oscillators (ILROs) that have been widely used for quadrature clock generators (QCGs) in multi-lane optical or wireline transceivers due to their low power, low area, and technology scalability. The asymmetrical or partial injection locking from 2 phases to 4 phases results in imbalances in amplitude and phase. We propose a modified frequency-domain analysis to provide intuitive insight into the performance design trade-offs. The analysis is validated by comparing analytical predictions with simulations for an ILRO-based QCG in 28 nm CMOS technology.
This dissertation also discusses the design of high-linearity wireless wideband receivers. An out-of-band (OB) IM3 cancellation technique is proposed and analyzed. By exploiting a baseband auxiliary path (AP) with a high-pass feature, the in-band (IB) desired signal and out-of-band interferers are split. OB third-order intermodulation products (IM3) are reconstructed in the AP and cancelled in the baseband (BB). A 0.5-2.5 GHz frequency-translational noise-cancelling (FTNC) receiver is implemented in 65nm CMOS to demonstrate the proposed approach. It consumes 36 mW without cancellation at 1 GHz LO frequency and 1.2 V supply, and it achieves 8.8 MHz baseband bandwidth, 40dB gain, 3.3dB NF, 5dBm OB IIP3, and −6.5dBm OB B1dB. After IM3 cancellation, the effective OB-IIP3 increases to 32.5 dBm with an extra 34 mW for narrow-band interferers (two tones). For wideband interferers, 18.8 dB cancellation is demonstrated over 10 MHz with two −15 dBm modulated interferers. The local oscillator (LO) leakage is −92 dBm and −88 dB at 1 GHz and 2 GHz LO respectively. In summary, this technique achieves both high OB linearity and good LO isolation
Recommended from our members
Architectures and Circuits Leveraging Injection-Locked Oscillators for Ultra-Low Voltage Clock Synthesis and Reference-less Receivers for Dense Chip-to-Chip Communications
High performance computing is critical for the needs of scientific discovery and economic competitiveness. An extreme-scale computing system at 1000x the performance of today’s petaflop machines will exhibit massive parallelism on multiple vertical fronts, from thousands of computational units on a single processor to thousands of processors in a single data center. To facilitate such a massively-parallel extreme-scale computing, a key challenge is power. The challenge is not power associated with base computation but rather the problem of transporting data from one chip to another at high enough rates. This thesis presents architectures and techniques to achieve low power and area footprint while achieving high data rates in a dense very-short reach (VSR) chip-to-chip (C2C) communication network. High-speed serial communication operating at ultra-low supplies improves the energy-efficiency and lowers the power envelop of a system doing an exaflop of loops. One focus area of this thesis is clock synthesis for such energy-efficient interconnect applications operating at high speeds and ultra-low supplies. A sub-integer clockfrequency synthesizer is presented that incorporates a multi-phase injection-locked ring-oscillator-based prescaler for operation at an ultra-low supply voltage of 0.5V, phase-switching based programmable division for sub-integer clock-frequency synthesis, and automatic calibration to ensure injection lock. A record speed of 9GHz has been demonstrated at 0.5V in 45nm SOI CMOS. It consumes 3.5mW of power at 9.12GHz and 0.052 of area, while showing an output phase noise of -100dBc/Hz at 1MHz offset and RMS jitter of 325fs; it achieves a net of -186.5 in a 45-nm SOI CMOS process. This thesis also describes a receiver with a reference-less clocking architecture for high-density VSR-C2C links. This architecture simplifies clock-tree planning in dense extreme-scaling computing environments and has high-bandwidth CDR to enable SSC for suppressing EMI and to mitigate TX jitter requirements. It features clock-less DFE and a high-bandwidth CDR based on master-slave ILOs for phase generation/rotation. The RX is implemented in 14nm CMOS and characterized at 19Gb/s. It is 1.5x faster that previous reference-less embedded-oscillator based designs with greater than 100MHz jitter tolerance bandwidth and recovers error-free data over VSR-C2C channels. It achieves a power-efficiency of 2.9pJ/b while recovering error-free data (BER 200MHz and the INL of the ILO-based phase-rotator (32- Steps/UI) is <1-LSB. Lastly, this thesis develops a time-domain delay-based modeling of injection locking to describe injection-locking phenomena in nonharmonic oscillators. The model is used to predict the locking bandwidth, and the locking dynamics of the locked oscillator. The model predictions are verified against simulations and measurements of a four-stage differential ring oscillator. The model is further used to predict the injection-locking behavior of a single-ended CMOS inverter based ring oscillator, the lock range of a multi-phase injection-locked ring-oscillator-based prescaler, as well as the dynamics of tracking injection phase perturbations in injection-locked masterslave oscillators; demonstrating its versatility in application to any nonharmonic oscillator
Clocking and Skew-Optimization For Source-Synchronous Simultaneous Bidirectional Links
There is continuous expansion of computing capabilities in mobile devices which
demands higher I/O bandwidth and dense parallel links supporting higher data rates. Highspeed
signaling leverages technology advancements to achieve higher data rates but is limited
by the bandwidth of the electrical copper channel which have not scaled accordingly.
To meet the continuous data-rate demand, Simultaneous Bi-directional (SBD) signaling
technique is an attractive alternative relative to uni-directional signaling as it can work at
lower clock speeds, exhibits better spectral efficiency and provides higher throughput in
pad limited PCBs.
For low-power and more robust system, the SBD transceiver should utilize forwarded
clock system and per-pin de-skew circuits to correct the phase difference developed
between the data and clock. The system can be configured in two roles, master and
slave. To save more power, the system should have only one clock generator. The master
has its own clock source and shares its clock to the slave through the clock channel, and the
slave uses this forwarded clock to deserialize the inbound data and serialize the outbound
data. A clock-to-data skew exists which can be corrected with a phase tracking CDR. This
thesis presents a low-power implementation of forwarded clocking and clock-to-data skew
optimization for a 40 Gbps SBD transceiver. The design is implemented in 28nm CMOS
technology and consumes 8.8mW of power for 20 Gbps NRZ data at 0.9 V supply. The
area occupied by the clocking 0.018 mm^2 area
Design of High-Speed SerDes Transceiver for Chip-to-Chip Communications in CMOS Process
With the continuous increase of on-chip computation capacities and exponential growth of data-intensive applications, the high-speed data transmission through serial links has become the backbone for modern communication systems. To satisfy the massive data-exchanging requirement, the data rate of such serial links has been updated from several Gb/s to tens of Gb/s. Currently, the commercial standards such as Ethernet 400GbE, InfiniBand high data rate (HDR), and common electrical interface (CEI)-56G has been developing towards 40+ Gb/s. As the core component within these links, the transceiver chipset plays a fundamental role in balancing the operation speed, power consumption, area occupation, and operation range. Meanwhile, the CMOS process has become the dominant technology in modern transceiver chip fabrications due to its large-scale digital integration capability and aggressive pricing advantage. This research aims to explore advanced techniques that are capable of exploiting the maximum operation speed of the CMOS process, and hence provides potential solutions for 40+ Gb/s CMOS transceiver designs. The major contributions are summarized as follows.
A low jitter ring-oscillator-based injection-locked clock multiplier (RILCM) with a hybrid frequency tracking loop that consists of a traditional phase-locked loop (PLL), a timing-adjusted loop, and a loop selection state-machine is implemented in 65-nm C-MOS process. In the ring voltage-controlled oscillator, a full-swing pseudo-differential delay cell is proposed to lower the device noise to phase noise conversion. To obtain high operation speed and high detection accuracy, a compact timing-adjusted phase detector tightly combined with a well-matched charge pump is designed. Meanwhile, a lock-loss detection and lock recovery is devised to endow the RILCM with a similar lock-acquisition ability as conventional PLL, thus excluding the initial frequency set- I up aid and preventing the potential lock-loss risk. The experimental results show that the figure-of-merit of the designed RILCM reaches -247.3 dB, which is better than previous RILCMs and even comparable to the large-area LC-ILCMs.
The transmitter (TX) and receiver (RX) chips are separately designed and fab- ricated in 65-nm CMOS process. The transmitter chip employs a quarter-rate multi-multiplexer (MUX)-based 4-tap feed-forward equalizer (FFE) to pre-distort the output. To increase the maximum operating speed, a bandwidth-enhanced 4:1 MUX with the capability of eliminating charge-sharing effect is proposed. To produce the quarter-rate parallel data streams with appropriate delays, a compact latch array associated with an interleaved-retiming technique is designed. The receiver chip employs a two-stage continuous-time linear equalizer (CTLE) as the analog front-end and integrates an improved clock data recovery to extract the sampling clocks and retime the incoming data. To automatically balance the jitter tracking and jitter suppression, passive low-pass filters with adaptively-adjusted bandwidth are introduced into the data-sampling path. To optimize the linearity of the phase interpolation, a time-averaging-based compensating phase interpolator is proposed. For equalization, a combined TX-FFE and RX-CTLE is applied to compensate for the channel loss, where a low-cost edge-data correlation-based sign zero-forcing adaptation algorithm is proposed to automatically adjust the TX-FFE’s tap weights. Measurement results show that the fabricated transmitter/receiver chipset can deliver 40 Gb/s random data at a bit error rate of 16 dB loss at the half-baud frequency, while consuming a total power of 370 mW
Self-Calibrated, Low-Jitter and Low-Reference-Spur Injection-Locked Clock Multipliers
Department of Electrical EngineeringThis dissertation focuses primarily on the design of calibrators for the injection-locked clock multiplier (ILCM). ILCMs have advantage to achieve an excellent jitter performance at low cost, in terms of area and power consumption. The wide loop bandwidth (BW) of the injection technique could reject the noise of voltage-controlled oscillator (VCO), making it thus suitable for the rejection of poor noise of a ring-VCO and a high frequency LC-VCO. However, it is difficult to use without calibrators because of its sensitiveness in process-voltage-temperature (PVT) variations. In Chapter 2, conventional frequency calibrators are introduced and discussed. This dissertation introduces two types of calibrators for low-power high-frequency LC-VCO-based ILFMs in Chapter 3 and Chapter 4 and high-performance ring-VCO-based ILCM in Chapter 5.
First, Chapter 3 presents a low power and compact area LC-tank-based frequency multiplier. In the proposed architecture, the input signals have a pulsed waveform that involves many high-order harmonics. Using an LC-tank that amplifies only the target harmonic component, while suppressing others, the output signal at the target frequency can be obtained. Since the core current flows for a very short duration, due to the pulsed input signals, the average power consumption can be dramatically reduced. Effective removal of spurious tones due to the damping of the signal is achieved using a limiting amplifier. In this work, a prototype frequency tripler using the proposed architecture was designed in a 65 nm CMOS process. The power consumption was 950 ??W, and the active area was 0.08 mm2. At a 3.12 GHz frequency, the phase noise degradation with respect to the theoretical bound was less than 0.5 dB.
Second, Chapter 4 presents an ultra-low-phase-noise ILFM for millimeter wave (mm-wave) fifth-generation
(5G) transceivers. Using an ultra-low-power frequency-tracking loop (FTL), the proposed ILFM is able to correct the frequency drifts of the quadrature voltage-controlled oscillator of the ILFM in a real-time fashion. Since the FTL is monitoring the averages of phase deviations rather than detecting or sampling the instantaneous values, it requires only 600??W to continue to calibrate the ILFM that generates an mm-wave signal with an output frequency from 27 to 30 GHz. The proposed ILFM was fabricated in a 65-nm CMOS process. The 10-MHz phase noise of the 29.25-GHz output signal was ???129.7 dBc/Hz, and its variations across temperatures and supply voltages were less than 2 dB. The integrated phase noise from 1 kHz to 100 MHz and the rms jitter were???39.1 dBc and 86 fs, respectively.
Third, Chapter 5 presents a low-jitter, low-reference-spur ring voltage-controlled oscillator (ring
VCO)-based ILCM. Since the proposed triple-point frequency/phase/slope calibrator (TP-FPSC) can
accurately remove the three root causes of the frequency errors of ILCMs (i.e., frequency drift, phase offset, and slope modulation), the ILCM of this work is able to achieve a low-level reference spur. In addition, the calibrating loop for the frequency drift of the TP-FPSC offers an additional suppression to the in-band phase noise of the output signal. This capability of the TP-FPSC and the naturally wide bandwidth of the injection-locking mechanism allows the ILCM to achieve a very low RMS jitter. The ILCM was fabricated in a 65-nm CMOS technology. The measured reference spur and RMS jitter were ???72 dBc and 140 fs, respectively, both of which are the best among the state-of-the-art ILCMs. The active silicon area was 0.055 mm2, and the power consumption was 11.0 mW.clos
Design of clock and data recovery circuits for energy-efficient short-reach optical transceivers
Nowadays, the increasing demand for cloud based computing and social media
services mandates higher throughput (at least 56 Gb/s per data lane with 400
Gb/s total capacity 1) for short reach optical links (with the reach typically less
than 2 km) inside data centres. The immediate consequences are the huge
and power hungry data centers. To address these issues the intra-data-center
connectivity by means of optical links needs continuous upgrading.
In recent years, the trend in the industry has shifted toward the use of more
complex modulation formats like PAM4 due to its spectral efficiency over the
traditional NRZ. Another advantage is the reduced number of channels count
which is more cost-effective considering the required area and the I/O density.
However employing PAM4 results in more complex transceivers circuitry due
to the presence of multilevel transitions and reduced noise budget. In addition,
providing higher speed while accommodating the stringent requirements
of higher density and energy efficiency (< 5 pJ/bit), makes the design of the
optical links more challenging and requires innovative design techniques both
at the system and circuit level.
This work presents the design of a Clock and Data Recovery Circuit (CDR) as
one of the key building blocks for the transceiver modules used in such fibreoptic
links. Capable of working with PAM4 signalling format, the new proposed
CDR architecture targets data rates of 50−56 Gb/s while achieving the required
energy efficiency (< 5 pJ/bit).
At the system level, the design proposes a new PAM4 PD which provides a better
trade-off in terms of bandwidth and systematic jitter generation in the CDR. By
using a digital loop controller (DLC), the CDR gains considerable area reduction
with flexibility to adjust the loop dynamics.
At the circuit level it focuses on applying different circuit techniques to mitigate
the circuit imperfections. It presents a wideband analog front end (AFE),
suitable for a 56 Gb/s, 28-Gbaud PAM-4 signal, by using an 8x interleaved, master/
slave based sample and hold circuit. In addition, the AFE is equipped with
a calibration scheme which corrects the errors associated with the sampling
channels’ offset voltage and gain mismatches. The presented digital to phase
converter (DPC) features a modified phase interpolator (PI), a new quadrature
phase corrector (QPC) and multi-phase output with de-skewing capabilities.The DPC (as a standalone block) and the CDR (as the main focus of this work)
were fabricated in 65-nm CMOS technology. Based on the measurements, the
DPC achieves DNL/INL of 0.7/6 LSB respectively while consuming 40.5 mW
power from 1.05 V supply. Although the CDR was not fully operational with
the PAM4 input, the results from 25-Gbaud PAM2 (NRZ) test setup were used
to estimate the performance. Under this scenario, the 1-UI JTOL bandwidth
was measured to be 2 MHz with BER threshold of 10−4. The chip consumes 236
mW of power while operating on 1 − 1.2 V supply range achieving an energyefficiency
of 4.27 pJ/bit
A 2-11 GHz 7-Bit High-Linearity Phase Rotator Based on Wideband Injection-Locking Multi-Phase Generation for High-Speed Serial Links in 28-nm CMOS FDSOI
none6nopartially_openMonaco, Enrico; Anzalone, Gabriele; Albasini, Guido; Erba, Simone; Bassi, Matteo; Mazzanti, AndreaMonaco, Enrico; Anzalone, Gabriele; Albasini, Guido; Erba, Simone; Bassi, Matteo; Mazzanti, Andre