61 research outputs found
A 10Gb/s Full On-chip Bang-Bang Clock and Data Recovery System Using an Adaptive Loop Bandwidth Strategy
As demand for higher bandwidth I/O grows, the front end design of serial link
becomes significant to overcome stringent timing requirements on noisy and bandwidthlimited
channels. As a clock reconstructing module in a receiver, the recovered clock
quality of Clock and Data Recovery is the main issue of the receiver performance.
However, from unknown incoming jitter, it is difficult to optimize loop dynamics to
minimize steady-state and dynamic jitter.
In this thesis a 10 Gb/s adaptive loop bandwidth clock and data recovery circuit
with on-chip loop filter is presented. The proposed system optimizes the loop bandwidth
adaptively to minimize jitter so that it leads to an improved jitter tolerance performance.
This architecture tunes the loop bandwidth by a factor of eight based on the phase
information of incoming data. The resulting architecture performs as good as a
maximum fixed loop bandwidth CDR while tracking high speed input jitter and as good
as a minimum fixed bandwidth CDR while suppressing wide bandwidth steady-state jitter. By employing a mixed mode predictor, high updating rate loop bandwidth
adaptation is achieved with low power consumption. Another relevant feature is that it
integrates a typically large off-chip filter using a capacitance multiplication technique
that employs dual charge pumps.
The functionality of the proposed architecture has been verified through
schematic and behavioral model simulations. In the simulation, the performance of jitter
tolerance is confirmed that the proposed solution provides improved results and
robustness to the variation of jitter profile. Its applicability to industrial standards is also
verified by the jitter tolerance passing SONET OC-192 successfully
Clocking and Skew-Optimization For Source-Synchronous Simultaneous Bidirectional Links
There is continuous expansion of computing capabilities in mobile devices which
demands higher I/O bandwidth and dense parallel links supporting higher data rates. Highspeed
signaling leverages technology advancements to achieve higher data rates but is limited
by the bandwidth of the electrical copper channel which have not scaled accordingly.
To meet the continuous data-rate demand, Simultaneous Bi-directional (SBD) signaling
technique is an attractive alternative relative to uni-directional signaling as it can work at
lower clock speeds, exhibits better spectral efficiency and provides higher throughput in
pad limited PCBs.
For low-power and more robust system, the SBD transceiver should utilize forwarded
clock system and per-pin de-skew circuits to correct the phase difference developed
between the data and clock. The system can be configured in two roles, master and
slave. To save more power, the system should have only one clock generator. The master
has its own clock source and shares its clock to the slave through the clock channel, and the
slave uses this forwarded clock to deserialize the inbound data and serialize the outbound
data. A clock-to-data skew exists which can be corrected with a phase tracking CDR. This
thesis presents a low-power implementation of forwarded clocking and clock-to-data skew
optimization for a 40 Gbps SBD transceiver. The design is implemented in 28nm CMOS
technology and consumes 8.8mW of power for 20 Gbps NRZ data at 0.9 V supply. The
area occupied by the clocking 0.018 mm^2 area
Design of clock and data recovery circuits for energy-efficient short-reach optical transceivers
Nowadays, the increasing demand for cloud based computing and social media
services mandates higher throughput (at least 56 Gb/s per data lane with 400
Gb/s total capacity 1) for short reach optical links (with the reach typically less
than 2 km) inside data centres. The immediate consequences are the huge
and power hungry data centers. To address these issues the intra-data-center
connectivity by means of optical links needs continuous upgrading.
In recent years, the trend in the industry has shifted toward the use of more
complex modulation formats like PAM4 due to its spectral efficiency over the
traditional NRZ. Another advantage is the reduced number of channels count
which is more cost-effective considering the required area and the I/O density.
However employing PAM4 results in more complex transceivers circuitry due
to the presence of multilevel transitions and reduced noise budget. In addition,
providing higher speed while accommodating the stringent requirements
of higher density and energy efficiency (< 5 pJ/bit), makes the design of the
optical links more challenging and requires innovative design techniques both
at the system and circuit level.
This work presents the design of a Clock and Data Recovery Circuit (CDR) as
one of the key building blocks for the transceiver modules used in such fibreoptic
links. Capable of working with PAM4 signalling format, the new proposed
CDR architecture targets data rates of 50−56 Gb/s while achieving the required
energy efficiency (< 5 pJ/bit).
At the system level, the design proposes a new PAM4 PD which provides a better
trade-off in terms of bandwidth and systematic jitter generation in the CDR. By
using a digital loop controller (DLC), the CDR gains considerable area reduction
with flexibility to adjust the loop dynamics.
At the circuit level it focuses on applying different circuit techniques to mitigate
the circuit imperfections. It presents a wideband analog front end (AFE),
suitable for a 56 Gb/s, 28-Gbaud PAM-4 signal, by using an 8x interleaved, master/
slave based sample and hold circuit. In addition, the AFE is equipped with
a calibration scheme which corrects the errors associated with the sampling
channels’ offset voltage and gain mismatches. The presented digital to phase
converter (DPC) features a modified phase interpolator (PI), a new quadrature
phase corrector (QPC) and multi-phase output with de-skewing capabilities.The DPC (as a standalone block) and the CDR (as the main focus of this work)
were fabricated in 65-nm CMOS technology. Based on the measurements, the
DPC achieves DNL/INL of 0.7/6 LSB respectively while consuming 40.5 mW
power from 1.05 V supply. Although the CDR was not fully operational with
the PAM4 input, the results from 25-Gbaud PAM2 (NRZ) test setup were used
to estimate the performance. Under this scenario, the 1-UI JTOL bandwidth
was measured to be 2 MHz with BER threshold of 10−4. The chip consumes 236
mW of power while operating on 1 − 1.2 V supply range achieving an energyefficiency
of 4.27 pJ/bit
Clock Synchronisation Assisted Clock and Data Recovery for Sub-Nanosecond Data Centre Optical Switching
In current `Cloud' data centres, switching of data between servers is performed using deep hierarchies of interconnected electronic packet switches. Demand for network bandwidth from emerging data centre workloads, combined with the slowing of silicon transistor scaling, is leading to a widening gap between data centre traffic demand and electronically-switched data centre network capacity. All-optical switches could offer a future-proof alternative, with potentially under a third of the power consumption and cost of electronically-switched networks. However, the effective bandwidth of optical switches depends on their overall switching time. This is dominated by the clock and data recovery (CDR) locking time, which takes hundreds of nanoseconds in commercial receivers. Current data centre traffic is dominated by small packets that transmit in tens of nanoseconds, leading to low effective bandwidth, as a high proportion of receiver time is spent performing CDR locking instead of receiving data, removing the benefits of optical switching. High-performance optical switching requires sub-nanosecond CDR locking time to overcome this limitation. This thesis proposes, models, and demonstrates clock synchronisation assisted CDR, which can achieve this. This approach uses clock synchronisation to simplify the complexity of CDR versus previous asynchronous approaches. An analytical model of the technique is first derived that establishes its potential viability. Following this, two approaches to clock synchronisation assisted CDR are investigated: 1. Clock phase caching, which uses clock phase storage and regular updates in a 2km intra-building scale data centre network interconnected by single-mode optical fibre. 2. Single calibration clock synchronisation assisted CDR}, which leverages the 20 times lower thermal sensitivity of hollow core optical fibre versus single-mode fibre to synchronise a 100m cluster scale data centre network, with a single initial phase calibration step. Using a real-time FPGA-based optical switch testbed, sub-nanosecond CDR locking time was demonstrated for both approaches
Design of High-Speed SerDes Transceiver for Chip-to-Chip Communications in CMOS Process
With the continuous increase of on-chip computation capacities and exponential growth of data-intensive applications, the high-speed data transmission through serial links has become the backbone for modern communication systems. To satisfy the massive data-exchanging requirement, the data rate of such serial links has been updated from several Gb/s to tens of Gb/s. Currently, the commercial standards such as Ethernet 400GbE, InfiniBand high data rate (HDR), and common electrical interface (CEI)-56G has been developing towards 40+ Gb/s. As the core component within these links, the transceiver chipset plays a fundamental role in balancing the operation speed, power consumption, area occupation, and operation range. Meanwhile, the CMOS process has become the dominant technology in modern transceiver chip fabrications due to its large-scale digital integration capability and aggressive pricing advantage. This research aims to explore advanced techniques that are capable of exploiting the maximum operation speed of the CMOS process, and hence provides potential solutions for 40+ Gb/s CMOS transceiver designs. The major contributions are summarized as follows.
A low jitter ring-oscillator-based injection-locked clock multiplier (RILCM) with a hybrid frequency tracking loop that consists of a traditional phase-locked loop (PLL), a timing-adjusted loop, and a loop selection state-machine is implemented in 65-nm C-MOS process. In the ring voltage-controlled oscillator, a full-swing pseudo-differential delay cell is proposed to lower the device noise to phase noise conversion. To obtain high operation speed and high detection accuracy, a compact timing-adjusted phase detector tightly combined with a well-matched charge pump is designed. Meanwhile, a lock-loss detection and lock recovery is devised to endow the RILCM with a similar lock-acquisition ability as conventional PLL, thus excluding the initial frequency set- I up aid and preventing the potential lock-loss risk. The experimental results show that the figure-of-merit of the designed RILCM reaches -247.3 dB, which is better than previous RILCMs and even comparable to the large-area LC-ILCMs.
The transmitter (TX) and receiver (RX) chips are separately designed and fab- ricated in 65-nm CMOS process. The transmitter chip employs a quarter-rate multi-multiplexer (MUX)-based 4-tap feed-forward equalizer (FFE) to pre-distort the output. To increase the maximum operating speed, a bandwidth-enhanced 4:1 MUX with the capability of eliminating charge-sharing effect is proposed. To produce the quarter-rate parallel data streams with appropriate delays, a compact latch array associated with an interleaved-retiming technique is designed. The receiver chip employs a two-stage continuous-time linear equalizer (CTLE) as the analog front-end and integrates an improved clock data recovery to extract the sampling clocks and retime the incoming data. To automatically balance the jitter tracking and jitter suppression, passive low-pass filters with adaptively-adjusted bandwidth are introduced into the data-sampling path. To optimize the linearity of the phase interpolation, a time-averaging-based compensating phase interpolator is proposed. For equalization, a combined TX-FFE and RX-CTLE is applied to compensate for the channel loss, where a low-cost edge-data correlation-based sign zero-forcing adaptation algorithm is proposed to automatically adjust the TX-FFE’s tap weights. Measurement results show that the fabricated transmitter/receiver chipset can deliver 40 Gb/s random data at a bit error rate of 16 dB loss at the half-baud frequency, while consuming a total power of 370 mW
Recommended from our members
Architectures and Circuits Leveraging Injection-Locked Oscillators for Ultra-Low Voltage Clock Synthesis and Reference-less Receivers for Dense Chip-to-Chip Communications
High performance computing is critical for the needs of scientific discovery and economic competitiveness. An extreme-scale computing system at 1000x the performance of today’s petaflop machines will exhibit massive parallelism on multiple vertical fronts, from thousands of computational units on a single processor to thousands of processors in a single data center. To facilitate such a massively-parallel extreme-scale computing, a key challenge is power. The challenge is not power associated with base computation but rather the problem of transporting data from one chip to another at high enough rates. This thesis presents architectures and techniques to achieve low power and area footprint while achieving high data rates in a dense very-short reach (VSR) chip-to-chip (C2C) communication network. High-speed serial communication operating at ultra-low supplies improves the energy-efficiency and lowers the power envelop of a system doing an exaflop of loops. One focus area of this thesis is clock synthesis for such energy-efficient interconnect applications operating at high speeds and ultra-low supplies. A sub-integer clockfrequency synthesizer is presented that incorporates a multi-phase injection-locked ring-oscillator-based prescaler for operation at an ultra-low supply voltage of 0.5V, phase-switching based programmable division for sub-integer clock-frequency synthesis, and automatic calibration to ensure injection lock. A record speed of 9GHz has been demonstrated at 0.5V in 45nm SOI CMOS. It consumes 3.5mW of power at 9.12GHz and 0.052 of area, while showing an output phase noise of -100dBc/Hz at 1MHz offset and RMS jitter of 325fs; it achieves a net of -186.5 in a 45-nm SOI CMOS process. This thesis also describes a receiver with a reference-less clocking architecture for high-density VSR-C2C links. This architecture simplifies clock-tree planning in dense extreme-scaling computing environments and has high-bandwidth CDR to enable SSC for suppressing EMI and to mitigate TX jitter requirements. It features clock-less DFE and a high-bandwidth CDR based on master-slave ILOs for phase generation/rotation. The RX is implemented in 14nm CMOS and characterized at 19Gb/s. It is 1.5x faster that previous reference-less embedded-oscillator based designs with greater than 100MHz jitter tolerance bandwidth and recovers error-free data over VSR-C2C channels. It achieves a power-efficiency of 2.9pJ/b while recovering error-free data (BER 200MHz and the INL of the ILO-based phase-rotator (32- Steps/UI) is <1-LSB. Lastly, this thesis develops a time-domain delay-based modeling of injection locking to describe injection-locking phenomena in nonharmonic oscillators. The model is used to predict the locking bandwidth, and the locking dynamics of the locked oscillator. The model predictions are verified against simulations and measurements of a four-stage differential ring oscillator. The model is further used to predict the injection-locking behavior of a single-ended CMOS inverter based ring oscillator, the lock range of a multi-phase injection-locked ring-oscillator-based prescaler, as well as the dynamics of tracking injection phase perturbations in injection-locked masterslave oscillators; demonstrating its versatility in application to any nonharmonic oscillator
Power-Proportional Optical Links
The continuous increase in data transfer rate in short-reach links, such as chip-to-chip and between servers within a data-center, demands high-speed links. As power efficiency becomes ever more important in these links, power-efficient optical links need to be designed.
Power efficiency in a link can be achieved by enabling power-proportional communication over the serial link. In power-proportional links, the power dissipated by a link is proportional to the amount of data communicated. Normally, data-rate demand is not constant, and the peak data-rate is not required all the time. If a link is not adapted according to the data-rate demand, there will be a fixed power dissipation, and the power efficiency of the link will degrade during the sub-maximal link utilization. Adapting links to real-time data-rate requirements reduces power dissipation. Power proportionality is achieved by scaling the power of the serial link linearly with the link utilization, and techniques such as variable data-rate and burst-mode can be adopted for this purpose. Links whose data rate (and hence power dissipation) can be varied in response to system demands are proposed in this work.
Past works have presented rapidly reconfigurable bandwidth in variable data-rate receivers, allowing lower power dissipation for lower data-rate operation. However, maintaining synchronization during reconfiguration was not possible since previous approaches have introduced changes in front-end delay when they are reconfigured. This work presents a technique that allows rapid bandwidth adjustment while maintaining a near-constant delay through the receiver suitable for a power-scalable variable data-rate optical link. Measurements of a fabricated integrated circuit (IC) show nearly constant energy per bit across a 2× variation in data rate while introducing less than 10 % of a unit interval (UI) of delay variation.
With continuously increasing data communication in data-centers, parallel optical links with ever-increasing per-lane data rates are being used to meet overall throughput demands. Simultaneously, power efficiency is becoming increasingly important for these links since they do not transmit useful data all the time. The burst-mode solution for vertical-cavity surface-emitting laser (VCSEL)-based point-to-point communication can be used to improve links’ energy efficiency during low link activity. The burst-mode technique for VCSEL-based links has not yet been deployed commercially. Past works have presented burst-mode solutions for single-channel receivers, allowing lower power dissipation during low link activity and solutions for fast activation of the receivers. However, this work presents a novel technique that allows rapid activation of a front-end and fast locking of a clock-and-data-recovery (CDR) for a multi-channel parallel link, utilizing opportunities arising from the parallel nature of many VCSEL-based links. The idea has been demonstrated through electrical and optical measurements of a fabricated IC at 10 Gbps, which show fast data detection and activation of the circuitry within 49 UIs while allowing the front-end to achieve better energy efficiency during low link activity. Simulation results are also presented in support of the proposed technique which allows the CDR to lock within 26 UIs from when it is powered on
Modelling and performance analysis of multigigabit serial interconnects using real number based analog verification methods
The increasing importance of multigigabit transceiver circuits in modern chip design calls for new methods of analyzing and integrating these challenging building blocks. This work presents a design and analysis framework basend on the SystemVerilog real number modeling ansatz. It further extends the simulation possibilities thus obtained by introducing additional higher level numeric modelling and evaluation methods to support multigigabit statistical link budgeting procedures based on the Peak Distortion Algorithm
Modelização em MatLab® de interfaces de comunicação de alto débito
Mestrado em Engenharia Electrónica e TelecomunicaçõesNow-a-days, high-speed digital data transmission is under continuous development. The constant increasing on the bitrates has been lead to the need of more sophisticated and complex receivers, systems that provide the recovering of the transmitted data over a dispersive channel that degrades the transmitted signal quality. Therefore, the receiver shall compensate the distortion introduced by the channel as well as synchronize the received signal that in addition to distortion, is also affected by jitter.
The distortion derived from the channel is attenuated by means of equalization circuits that offset the channel frequency response at the transmission rate, making it as flat as possible for the desired frequency. On the other hand, the synchronization of the received signal is achieved by means of clock and data recovery circuits that usually recover the clock signal through the data transitions for sampling the received data.
The main focus of this thesis concerns the modeling of a data receiver for a high-speed interface. The simulation of the data receiver block implies the modeling of a transmission channel depending on its characteristics.
The proposed transmission system, from the transmitter to the output of the data recovery block, includes equalization filters for signal conditioning, of which several distinct architectures are studied.
It’s proposed two architectures for the clock and data recovery circuit. The first one is a 2x oversampling clock and data recovery circuit based on a Phase Tracking architecture. The second one, is a 3x oversampling clock and data recovery based on a Blind Sampling architecture.
By modeling both of the architectures of the clock and data recovery circuit, it’s intended to analyze the respective jitter tolerance results. It is crucial to know the amount of jitter that can be tolerated by these circuits in order to recover the data with a satisfying bit error ratio. The obtained results show a very close match to the theoretical values, where the 2x and 3x oversampling architecture presents a jitter tolerance of, approximately, 12UI and 23UI respectively for low jitter frequencies.Hoje em dia, a transmissão de dados digital de alto débito binário encontra-se em constante evolução. O contínuo aumento das taxas de transmissão tem vindo a exigir sistemas de receção cada vez mais sofisticados e complexos, que facultem a recuperação dos dados transmitidos ao longo de um canal dispersivo que degrada a qualidade do sinal transmitido. Consequentemente, cabe ao recetor compensar a distorção introduzida pelo canal bem como a sincronização do sinal recebido que, para além de sofrer distorção, vem também afetado por jitter.
A distorção introduzida pelo canal é atenuada através de circuitos de igualização, que compensam a resposta em frequência do canal à frequência de transmissão, de maneira a tornar a mesma o mais plana possível para a frequência desejada. Por sua vez, a sincronização do sinal recebido é conseguida através de circuitos de recuperação de dados e relógio, que, geralmente, geram um sinal de relógio a partir das transições do sinal de dados que é posteriormente utilizado para fazer a amostragem dos dados recebidos.
O principal foco desta tese incide na modelação de um sistema de receção de dados de uma interface de alta velocidade. A simulação do bloco de receção de dados implica a modelação de um canal de transmissão em função das características do mesmo.
O sistema de transmissão proposto, desde o transmissor até à saída do bloco de recuperação de dados, inclui filtros de igualização para acondicionamento de sinal, dos quais várias arquiteturas distintas são estudadas.
São propostas duas arquiteturas para o circuito de recuperação de dados e relógio. A primeira trata-se de um circuito de recuperação de dados e relógio com sobre-amostragem 2x, baseado numa arquitetura de Phase Tracking. A segunda arquitetura trata-se de um circuito de recuperação de dados e relógio com sobre-amostragem 3x, baseado num arquitetura Blind Sampling.
A análise de resultados da modelação de ambas as arquiteturas do circuito de recuperação de dados e relógio é realizada através da aquisição das respetivas curvas de tolerância de jitter. É fundamental conhecer a quantidade de jitter tolerado por estes circuitos a fim de recuperar os dados com uma probabilidade de erro de bit satisfatória. Os resultados obtidos mostram uma correspondência bastante próxima dos valores teóricos, onde a arquitetura com sobre-amostragem 2x apresenta uma tolerância de jitter de, aproximadamente, 12UI e a arquitetura com sobre-amostragem 3x apresenta uma tolerância de, aproximadamente, 23UI para baixas frequências de jitter
- …