61 research outputs found

    A 10Gb/s Full On-chip Bang-Bang Clock and Data Recovery System Using an Adaptive Loop Bandwidth Strategy

    Get PDF
    As demand for higher bandwidth I/O grows, the front end design of serial link becomes significant to overcome stringent timing requirements on noisy and bandwidthlimited channels. As a clock reconstructing module in a receiver, the recovered clock quality of Clock and Data Recovery is the main issue of the receiver performance. However, from unknown incoming jitter, it is difficult to optimize loop dynamics to minimize steady-state and dynamic jitter. In this thesis a 10 Gb/s adaptive loop bandwidth clock and data recovery circuit with on-chip loop filter is presented. The proposed system optimizes the loop bandwidth adaptively to minimize jitter so that it leads to an improved jitter tolerance performance. This architecture tunes the loop bandwidth by a factor of eight based on the phase information of incoming data. The resulting architecture performs as good as a maximum fixed loop bandwidth CDR while tracking high speed input jitter and as good as a minimum fixed bandwidth CDR while suppressing wide bandwidth steady-state jitter. By employing a mixed mode predictor, high updating rate loop bandwidth adaptation is achieved with low power consumption. Another relevant feature is that it integrates a typically large off-chip filter using a capacitance multiplication technique that employs dual charge pumps. The functionality of the proposed architecture has been verified through schematic and behavioral model simulations. In the simulation, the performance of jitter tolerance is confirmed that the proposed solution provides improved results and robustness to the variation of jitter profile. Its applicability to industrial standards is also verified by the jitter tolerance passing SONET OC-192 successfully

    Clocking and Skew-Optimization For Source-Synchronous Simultaneous Bidirectional Links

    Get PDF
    There is continuous expansion of computing capabilities in mobile devices which demands higher I/O bandwidth and dense parallel links supporting higher data rates. Highspeed signaling leverages technology advancements to achieve higher data rates but is limited by the bandwidth of the electrical copper channel which have not scaled accordingly. To meet the continuous data-rate demand, Simultaneous Bi-directional (SBD) signaling technique is an attractive alternative relative to uni-directional signaling as it can work at lower clock speeds, exhibits better spectral efficiency and provides higher throughput in pad limited PCBs. For low-power and more robust system, the SBD transceiver should utilize forwarded clock system and per-pin de-skew circuits to correct the phase difference developed between the data and clock. The system can be configured in two roles, master and slave. To save more power, the system should have only one clock generator. The master has its own clock source and shares its clock to the slave through the clock channel, and the slave uses this forwarded clock to deserialize the inbound data and serialize the outbound data. A clock-to-data skew exists which can be corrected with a phase tracking CDR. This thesis presents a low-power implementation of forwarded clocking and clock-to-data skew optimization for a 40 Gbps SBD transceiver. The design is implemented in 28nm CMOS technology and consumes 8.8mW of power for 20 Gbps NRZ data at 0.9 V supply. The area occupied by the clocking 0.018 mm^2 area

    Design of clock and data recovery circuits for energy-efficient short-reach optical transceivers

    Get PDF
    Nowadays, the increasing demand for cloud based computing and social media services mandates higher throughput (at least 56 Gb/s per data lane with 400 Gb/s total capacity 1) for short reach optical links (with the reach typically less than 2 km) inside data centres. The immediate consequences are the huge and power hungry data centers. To address these issues the intra-data-center connectivity by means of optical links needs continuous upgrading. In recent years, the trend in the industry has shifted toward the use of more complex modulation formats like PAM4 due to its spectral efficiency over the traditional NRZ. Another advantage is the reduced number of channels count which is more cost-effective considering the required area and the I/O density. However employing PAM4 results in more complex transceivers circuitry due to the presence of multilevel transitions and reduced noise budget. In addition, providing higher speed while accommodating the stringent requirements of higher density and energy efficiency (< 5 pJ/bit), makes the design of the optical links more challenging and requires innovative design techniques both at the system and circuit level. This work presents the design of a Clock and Data Recovery Circuit (CDR) as one of the key building blocks for the transceiver modules used in such fibreoptic links. Capable of working with PAM4 signalling format, the new proposed CDR architecture targets data rates of 50−56 Gb/s while achieving the required energy efficiency (< 5 pJ/bit). At the system level, the design proposes a new PAM4 PD which provides a better trade-off in terms of bandwidth and systematic jitter generation in the CDR. By using a digital loop controller (DLC), the CDR gains considerable area reduction with flexibility to adjust the loop dynamics. At the circuit level it focuses on applying different circuit techniques to mitigate the circuit imperfections. It presents a wideband analog front end (AFE), suitable for a 56 Gb/s, 28-Gbaud PAM-4 signal, by using an 8x interleaved, master/ slave based sample and hold circuit. In addition, the AFE is equipped with a calibration scheme which corrects the errors associated with the sampling channels’ offset voltage and gain mismatches. The presented digital to phase converter (DPC) features a modified phase interpolator (PI), a new quadrature phase corrector (QPC) and multi-phase output with de-skewing capabilities.The DPC (as a standalone block) and the CDR (as the main focus of this work) were fabricated in 65-nm CMOS technology. Based on the measurements, the DPC achieves DNL/INL of 0.7/6 LSB respectively while consuming 40.5 mW power from 1.05 V supply. Although the CDR was not fully operational with the PAM4 input, the results from 25-Gbaud PAM2 (NRZ) test setup were used to estimate the performance. Under this scenario, the 1-UI JTOL bandwidth was measured to be 2 MHz with BER threshold of 10−4. The chip consumes 236 mW of power while operating on 1 − 1.2 V supply range achieving an energyefficiency of 4.27 pJ/bit

    Clock Synchronisation Assisted Clock and Data Recovery for Sub-Nanosecond Data Centre Optical Switching

    Get PDF
    In current `Cloud' data centres, switching of data between servers is performed using deep hierarchies of interconnected electronic packet switches. Demand for network bandwidth from emerging data centre workloads, combined with the slowing of silicon transistor scaling, is leading to a widening gap between data centre traffic demand and electronically-switched data centre network capacity. All-optical switches could offer a future-proof alternative, with potentially under a third of the power consumption and cost of electronically-switched networks. However, the effective bandwidth of optical switches depends on their overall switching time. This is dominated by the clock and data recovery (CDR) locking time, which takes hundreds of nanoseconds in commercial receivers. Current data centre traffic is dominated by small packets that transmit in tens of nanoseconds, leading to low effective bandwidth, as a high proportion of receiver time is spent performing CDR locking instead of receiving data, removing the benefits of optical switching. High-performance optical switching requires sub-nanosecond CDR locking time to overcome this limitation. This thesis proposes, models, and demonstrates clock synchronisation assisted CDR, which can achieve this. This approach uses clock synchronisation to simplify the complexity of CDR versus previous asynchronous approaches. An analytical model of the technique is first derived that establishes its potential viability. Following this, two approaches to clock synchronisation assisted CDR are investigated: 1. Clock phase caching, which uses clock phase storage and regular updates in a 2km intra-building scale data centre network interconnected by single-mode optical fibre. 2. Single calibration clock synchronisation assisted CDR}, which leverages the 20 times lower thermal sensitivity of hollow core optical fibre versus single-mode fibre to synchronise a 100m cluster scale data centre network, with a single initial phase calibration step. Using a real-time FPGA-based optical switch testbed, sub-nanosecond CDR locking time was demonstrated for both approaches

    Design of High-Speed SerDes Transceiver for Chip-to-Chip Communications in CMOS Process

    Get PDF
    With the continuous increase of on-chip computation capacities and exponential growth of data-intensive applications, the high-speed data transmission through serial links has become the backbone for modern communication systems. To satisfy the massive data-exchanging requirement, the data rate of such serial links has been updated from several Gb/s to tens of Gb/s. Currently, the commercial standards such as Ethernet 400GbE, InfiniBand high data rate (HDR), and common electrical interface (CEI)-56G has been developing towards 40+ Gb/s. As the core component within these links, the transceiver chipset plays a fundamental role in balancing the operation speed, power consumption, area occupation, and operation range. Meanwhile, the CMOS process has become the dominant technology in modern transceiver chip fabrications due to its large-scale digital integration capability and aggressive pricing advantage. This research aims to explore advanced techniques that are capable of exploiting the maximum operation speed of the CMOS process, and hence provides potential solutions for 40+ Gb/s CMOS transceiver designs. The major contributions are summarized as follows. A low jitter ring-oscillator-based injection-locked clock multiplier (RILCM) with a hybrid frequency tracking loop that consists of a traditional phase-locked loop (PLL), a timing-adjusted loop, and a loop selection state-machine is implemented in 65-nm C-MOS process. In the ring voltage-controlled oscillator, a full-swing pseudo-differential delay cell is proposed to lower the device noise to phase noise conversion. To obtain high operation speed and high detection accuracy, a compact timing-adjusted phase detector tightly combined with a well-matched charge pump is designed. Meanwhile, a lock-loss detection and lock recovery is devised to endow the RILCM with a similar lock-acquisition ability as conventional PLL, thus excluding the initial frequency set- I up aid and preventing the potential lock-loss risk. The experimental results show that the figure-of-merit of the designed RILCM reaches -247.3 dB, which is better than previous RILCMs and even comparable to the large-area LC-ILCMs. The transmitter (TX) and receiver (RX) chips are separately designed and fab- ricated in 65-nm CMOS process. The transmitter chip employs a quarter-rate multi-multiplexer (MUX)-based 4-tap feed-forward equalizer (FFE) to pre-distort the output. To increase the maximum operating speed, a bandwidth-enhanced 4:1 MUX with the capability of eliminating charge-sharing effect is proposed. To produce the quarter-rate parallel data streams with appropriate delays, a compact latch array associated with an interleaved-retiming technique is designed. The receiver chip employs a two-stage continuous-time linear equalizer (CTLE) as the analog front-end and integrates an improved clock data recovery to extract the sampling clocks and retime the incoming data. To automatically balance the jitter tracking and jitter suppression, passive low-pass filters with adaptively-adjusted bandwidth are introduced into the data-sampling path. To optimize the linearity of the phase interpolation, a time-averaging-based compensating phase interpolator is proposed. For equalization, a combined TX-FFE and RX-CTLE is applied to compensate for the channel loss, where a low-cost edge-data correlation-based sign zero-forcing adaptation algorithm is proposed to automatically adjust the TX-FFE’s tap weights. Measurement results show that the fabricated transmitter/receiver chipset can deliver 40 Gb/s random data at a bit error rate of 16 dB loss at the half-baud frequency, while consuming a total power of 370 mW

    Power-Proportional Optical Links

    Get PDF
    The continuous increase in data transfer rate in short-reach links, such as chip-to-chip and between servers within a data-center, demands high-speed links. As power efficiency becomes ever more important in these links, power-efficient optical links need to be designed. Power efficiency in a link can be achieved by enabling power-proportional communication over the serial link. In power-proportional links, the power dissipated by a link is proportional to the amount of data communicated. Normally, data-rate demand is not constant, and the peak data-rate is not required all the time. If a link is not adapted according to the data-rate demand, there will be a fixed power dissipation, and the power efficiency of the link will degrade during the sub-maximal link utilization. Adapting links to real-time data-rate requirements reduces power dissipation. Power proportionality is achieved by scaling the power of the serial link linearly with the link utilization, and techniques such as variable data-rate and burst-mode can be adopted for this purpose. Links whose data rate (and hence power dissipation) can be varied in response to system demands are proposed in this work. Past works have presented rapidly reconfigurable bandwidth in variable data-rate receivers, allowing lower power dissipation for lower data-rate operation. However, maintaining synchronization during reconfiguration was not possible since previous approaches have introduced changes in front-end delay when they are reconfigured. This work presents a technique that allows rapid bandwidth adjustment while maintaining a near-constant delay through the receiver suitable for a power-scalable variable data-rate optical link. Measurements of a fabricated integrated circuit (IC) show nearly constant energy per bit across a 2× variation in data rate while introducing less than 10 % of a unit interval (UI) of delay variation. With continuously increasing data communication in data-centers, parallel optical links with ever-increasing per-lane data rates are being used to meet overall throughput demands. Simultaneously, power efficiency is becoming increasingly important for these links since they do not transmit useful data all the time. The burst-mode solution for vertical-cavity surface-emitting laser (VCSEL)-based point-to-point communication can be used to improve links’ energy efficiency during low link activity. The burst-mode technique for VCSEL-based links has not yet been deployed commercially. Past works have presented burst-mode solutions for single-channel receivers, allowing lower power dissipation during low link activity and solutions for fast activation of the receivers. However, this work presents a novel technique that allows rapid activation of a front-end and fast locking of a clock-and-data-recovery (CDR) for a multi-channel parallel link, utilizing opportunities arising from the parallel nature of many VCSEL-based links. The idea has been demonstrated through electrical and optical measurements of a fabricated IC at 10 Gbps, which show fast data detection and activation of the circuitry within 49 UIs while allowing the front-end to achieve better energy efficiency during low link activity. Simulation results are also presented in support of the proposed technique which allows the CDR to lock within 26 UIs from when it is powered on

    Modelling and performance analysis of multigigabit serial interconnects using real number based analog verification methods

    Get PDF
    The increasing importance of multigigabit transceiver circuits in modern chip design calls for new methods of analyzing and integrating these challenging building blocks. This work presents a design and analysis framework basend on the SystemVerilog real number modeling ansatz. It further extends the simulation possibilities thus obtained by introducing additional higher level numeric modelling and evaluation methods to support multigigabit statistical link budgeting procedures based on the Peak Distortion Algorithm

    Modelização em MatLab® de interfaces de comunicação de alto débito

    Get PDF
    Mestrado em Engenharia Electrónica e TelecomunicaçõesNow-a-days, high-speed digital data transmission is under continuous development. The constant increasing on the bitrates has been lead to the need of more sophisticated and complex receivers, systems that provide the recovering of the transmitted data over a dispersive channel that degrades the transmitted signal quality. Therefore, the receiver shall compensate the distortion introduced by the channel as well as synchronize the received signal that in addition to distortion, is also affected by jitter. The distortion derived from the channel is attenuated by means of equalization circuits that offset the channel frequency response at the transmission rate, making it as flat as possible for the desired frequency. On the other hand, the synchronization of the received signal is achieved by means of clock and data recovery circuits that usually recover the clock signal through the data transitions for sampling the received data. The main focus of this thesis concerns the modeling of a data receiver for a high-speed interface. The simulation of the data receiver block implies the modeling of a transmission channel depending on its characteristics. The proposed transmission system, from the transmitter to the output of the data recovery block, includes equalization filters for signal conditioning, of which several distinct architectures are studied. It’s proposed two architectures for the clock and data recovery circuit. The first one is a 2x oversampling clock and data recovery circuit based on a Phase Tracking architecture. The second one, is a 3x oversampling clock and data recovery based on a Blind Sampling architecture. By modeling both of the architectures of the clock and data recovery circuit, it’s intended to analyze the respective jitter tolerance results. It is crucial to know the amount of jitter that can be tolerated by these circuits in order to recover the data with a satisfying bit error ratio. The obtained results show a very close match to the theoretical values, where the 2x and 3x oversampling architecture presents a jitter tolerance of, approximately, 12UI and 23UI respectively for low jitter frequencies.Hoje em dia, a transmissão de dados digital de alto débito binário encontra-se em constante evolução. O contínuo aumento das taxas de transmissão tem vindo a exigir sistemas de receção cada vez mais sofisticados e complexos, que facultem a recuperação dos dados transmitidos ao longo de um canal dispersivo que degrada a qualidade do sinal transmitido. Consequentemente, cabe ao recetor compensar a distorção introduzida pelo canal bem como a sincronização do sinal recebido que, para além de sofrer distorção, vem também afetado por jitter. A distorção introduzida pelo canal é atenuada através de circuitos de igualização, que compensam a resposta em frequência do canal à frequência de transmissão, de maneira a tornar a mesma o mais plana possível para a frequência desejada. Por sua vez, a sincronização do sinal recebido é conseguida através de circuitos de recuperação de dados e relógio, que, geralmente, geram um sinal de relógio a partir das transições do sinal de dados que é posteriormente utilizado para fazer a amostragem dos dados recebidos. O principal foco desta tese incide na modelação de um sistema de receção de dados de uma interface de alta velocidade. A simulação do bloco de receção de dados implica a modelação de um canal de transmissão em função das características do mesmo. O sistema de transmissão proposto, desde o transmissor até à saída do bloco de recuperação de dados, inclui filtros de igualização para acondicionamento de sinal, dos quais várias arquiteturas distintas são estudadas. São propostas duas arquiteturas para o circuito de recuperação de dados e relógio. A primeira trata-se de um circuito de recuperação de dados e relógio com sobre-amostragem 2x, baseado numa arquitetura de Phase Tracking. A segunda arquitetura trata-se de um circuito de recuperação de dados e relógio com sobre-amostragem 3x, baseado num arquitetura Blind Sampling. A análise de resultados da modelação de ambas as arquiteturas do circuito de recuperação de dados e relógio é realizada através da aquisição das respetivas curvas de tolerância de jitter. É fundamental conhecer a quantidade de jitter tolerado por estes circuitos a fim de recuperar os dados com uma probabilidade de erro de bit satisfatória. Os resultados obtidos mostram uma correspondência bastante próxima dos valores teóricos, onde a arquitetura com sobre-amostragem 2x apresenta uma tolerância de jitter de, aproximadamente, 12UI e a arquitetura com sobre-amostragem 3x apresenta uma tolerância de, aproximadamente, 23UI para baixas frequências de jitter
    corecore