334 research outputs found
Characterization of a gigabit transceiver for the ATLAS inner tracker pixel detector readout upgrade
We present a gigabit transceiver prototype Application Specific Integrated
Circuit (ASIC), GBCR, for the ATLAS Inner Tracker (ITk) Pixel detector readout
upgrade. GBCR is designed in a 65-nm CMOS technology and consists of four
upstream receiver channels, a downstream transmitter channel, and an
Inter-Integrated Circuit (I2C) slave. The upstream channels receive the data at
5.12 Gbps passing through 5-meter 34-American Wire Gauge (AWG) Twin-axial
(Twinax) cables, equalize them, retime them with a recovered clock, and then
drive an optical transmitter. The downstream channel receives the data at 2.56
Gbps from an optical receiver and drives the cable as same as the upstream
channels. The jitter of the upstream channel output is measured to be 35 ps
(peak-peak) when the Clock-Data Recovery (CDR) module is turned on and the
jitter of the downstream channel output after the cable is 138 ps (peak-peak).
The power consumption of each upstream channel is 72 mW when the CDR module is
turned on and the downstream channel consumes 27 mW. GBCR survives the total
ionizing dose of 200 kGy.Comment: 11 pages, 14 figure
2.5 Gbps clock data recovery using 1/4th-rate quadricorrelator frequency detector and skew-calibrated multi-phase clock generator
A Gb/s clock and data recovery (CDR) circuit using 1/4th-rate digital quadricorrelator frequency detector and skew-calibrated multi-phase voltage-controlled oscillator is presented. With 1/4th-rate clock architecture, the coil-free oscillator can have lower operation frequency providing sufficient low-jitter operation. Moreover, it is an inherent 1-to-4 DEMUX. The skew calibration scheme is applied to reduce phase offset in multi-phase clock generator. The CDR with frequency detector can have small loop bandwidth, wide pull-in range and can operate without the need for a local reference clock. This 1/4th-rate CDR is implemented in standard 0.18 μm CMOS technology. It has an active area of 0.7 mm2 and consumes 100 mW at 1.8 V supply. The CDR has low jitter operation in a wide frequency range from 1–2.25 Gb/s. Measurement of Bit-Error Rate is less than 10−12 for 2.25 Gb/s incoming data 27−1 PRBS, jitter peak-to-peak of 0.7 unit interval (UI) modulation at 10 MHz
Recommended from our members
Architectures and Circuits Leveraging Injection-Locked Oscillators for Ultra-Low Voltage Clock Synthesis and Reference-less Receivers for Dense Chip-to-Chip Communications
High performance computing is critical for the needs of scientific discovery and economic competitiveness. An extreme-scale computing system at 1000x the performance of today’s petaflop machines will exhibit massive parallelism on multiple vertical fronts, from thousands of computational units on a single processor to thousands of processors in a single data center. To facilitate such a massively-parallel extreme-scale computing, a key challenge is power. The challenge is not power associated with base computation but rather the problem of transporting data from one chip to another at high enough rates. This thesis presents architectures and techniques to achieve low power and area footprint while achieving high data rates in a dense very-short reach (VSR) chip-to-chip (C2C) communication network. High-speed serial communication operating at ultra-low supplies improves the energy-efficiency and lowers the power envelop of a system doing an exaflop of loops. One focus area of this thesis is clock synthesis for such energy-efficient interconnect applications operating at high speeds and ultra-low supplies. A sub-integer clockfrequency synthesizer is presented that incorporates a multi-phase injection-locked ring-oscillator-based prescaler for operation at an ultra-low supply voltage of 0.5V, phase-switching based programmable division for sub-integer clock-frequency synthesis, and automatic calibration to ensure injection lock. A record speed of 9GHz has been demonstrated at 0.5V in 45nm SOI CMOS. It consumes 3.5mW of power at 9.12GHz and 0.052 of area, while showing an output phase noise of -100dBc/Hz at 1MHz offset and RMS jitter of 325fs; it achieves a net of -186.5 in a 45-nm SOI CMOS process. This thesis also describes a receiver with a reference-less clocking architecture for high-density VSR-C2C links. This architecture simplifies clock-tree planning in dense extreme-scaling computing environments and has high-bandwidth CDR to enable SSC for suppressing EMI and to mitigate TX jitter requirements. It features clock-less DFE and a high-bandwidth CDR based on master-slave ILOs for phase generation/rotation. The RX is implemented in 14nm CMOS and characterized at 19Gb/s. It is 1.5x faster that previous reference-less embedded-oscillator based designs with greater than 100MHz jitter tolerance bandwidth and recovers error-free data over VSR-C2C channels. It achieves a power-efficiency of 2.9pJ/b while recovering error-free data (BER 200MHz and the INL of the ILO-based phase-rotator (32- Steps/UI) is <1-LSB. Lastly, this thesis develops a time-domain delay-based modeling of injection locking to describe injection-locking phenomena in nonharmonic oscillators. The model is used to predict the locking bandwidth, and the locking dynamics of the locked oscillator. The model predictions are verified against simulations and measurements of a four-stage differential ring oscillator. The model is further used to predict the injection-locking behavior of a single-ended CMOS inverter based ring oscillator, the lock range of a multi-phase injection-locked ring-oscillator-based prescaler, as well as the dynamics of tracking injection phase perturbations in injection-locked masterslave oscillators; demonstrating its versatility in application to any nonharmonic oscillator
Analysis and Design of Robust Multi-Gb/s Clock and Data Recovery Circuits
The bandwidth demands of modern computing systems have been continually increasing and the recent focus on parallel processing will only increase the demands placed on data communication circuits. As data rates enter the multi-Gb/s range, serial data communication architectures become attractive as compared to parallel architectures. Serial architectures have long been used in fibre optic systems for long-haul applications, however, in the past decade there has been a trend towards multi-Gb/s backplane interconnects. The integration of clock and data recovery (CDR) circuits into monolithic integrated circuits (ICs) is attractive as it improves performance and reduces the system cost, however it also introduces new challenges, one of which is robustness. In serial data communication systems the CDR circuit is responsible for recovering the data from an incoming data stream. In recent years there has been a great deal of research into integrating CDR circuits into monolithic ICs. Most research has focused on increasing the bandwidth of the circuits, however in order to integrate multi-Gb/s CDR circuits robustness, as well as performance, must be considered. In this thesis CDR circuits are analyzed with respect to their robustness. The phase detector is a critical block in a CDR circuit and its robustness will play a significant role in determining the overall performance in the presence of process non-idealities. Several phase detector architectures are analyzed to determine the effects of process non-idealities. Static phase offsets are introduced as a figure of merit for phase detectors and a mathematical framework is described to characterize the negative effects of static phase offsets on CDR circuits. Two approaches are taken to improve the robustness of CDR circuits. First, calibration circuits are introduced which correct for static phase offsets in CDR circuits. Secondly, phase detector circuits are introduced which have been designed to optimize both performance and robustness. Several prototype chips which implement these schemes will be described and measured results will be presented. These results show that while CDR circuits are vulnerable to the effects of process non-idealities, there are circuit techniques which can mitigate many of these concerns
Clock And Data Recovery Using Bang-bang Pll’s
Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2008Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2008Bu çalışmada, saat ve data işaretlerinin yeniden çıkarımında kullanılan iki konumlu faz kititlemeli çevrimlerden bahsedilmiştir. Sistem seviyesinde hızlı simülasyonlar yapabilmek amacıyla çevrim elemanlarının davranışsal modelleri geliştirilmiştir. İki konumlu kontrol sistemlerinin el ile analizinin oldukça zor olmasından dolayı modelleme zorunlu hale gelmektedir. Ayrıca gerçeklenen elemanların idealsizliklerinden kaynaklanan davranışlar da olabilidiğince modellenmeye çalışılmıştır. Söz konusu faz kilitlemeli çevrimlerin sistem seviyesinde sağlaması gereken özelliklerin kabaca hesaplanması ve datadaki değişim sıklığının bu özellikleri nasıl etkilediği anlatılmıştır. Çevrim elemanlarının tranzistör seviyesinde nasıl gerçeklendiklerinden bahsedilmiştir. Çok kullanılan bir ring osilatör yapısı olan simetrik yüklü osilatör (Maneatis yük) çevrimde etkili bir şekilde kullanabilmek amacıyla modifiye edilmiştir. Osilatörün üretim ve sıcaklık değişimlerini tolere edebilmesi için kazancının yüksek olması gerekir. Bu da sistemin harici gürültü kaynaklarına (besleme, taban gürültüsü gibi) olan duyarlılığını oldukça arttırmaktadır. Bu nedenle osilatörü otomatik olarak kalibre eden bir teknik geliştirilmiştir. Değişik faz kilitlemeli çevrimlere uygulanabilen teknik için osilatörün akım kontollü olması gerekmektedir. Frekans kitlenmesi gerçekleştikten sonra osilatörün akımı bir analog-sayısal çevirici ile örneklenmekte ve asıl sistem bu nokta etrafında daha dar bir bölgede çalışmaktadır. Ayrıca, sıcaklıktan kaynaklanabilecek değişimler de analog-sayısal dönüştürücünün refererans akımı üzerinden kompanze edilmektedir. Son olarak, tasarlanan sistemin simülasyon sonuçları verilmiştir. 0.18um CMOS teknolojisinde tasarlanan devre 5Gb/s data hızlarında çalışabilmektedir.In this work, bang-bang PLL structures, which are extensively used in clock and data recovery systems, are investigated. Behavioral models of loop elements are created to do faster simulations in system level. This step is mandatory in bang-bang systems, which are hard to analyze with simple calculations. Some non-idealities of real circuit elements are inserted to these models. System level design issues of bang-bang PLL’s are discussed and the effect of data transition density to system specifications is mentioned. Transistor level implementations of loop elements are described. A popular delay cell with symmetric loads (Maneatis cell) is modified to be used effectively in a bang-bang loop. Gain of the VCO seems very large after initial design, which is required to cover the operating frequency range over process and temperature corners. Large gain makes the system prone to external noise sources such as noise from power supply, substrate etc. Therefore, an automatic calibration method is developed to reduce the VCO gain. This technique can be applied to any current controlled oscillators in various phase locked loops. After frequency lock is achieved, current of the oscillator is sampled by a current mode ADC and a narrower range is generated around that point. Additionally, frequency variation due to temperature is compensated through the specifically designed reference current of ADC. Finally, simulation results of CDR and calibration circuits are given. CDR is designed in 0.18um CMOS technology and can operate at 5Gb/s data rate.Yüksek LisansM.Sc
Clock Synchronisation Assisted Clock and Data Recovery for Sub-Nanosecond Data Centre Optical Switching
In current `Cloud' data centres, switching of data between servers is performed using deep hierarchies of interconnected electronic packet switches. Demand for network bandwidth from emerging data centre workloads, combined with the slowing of silicon transistor scaling, is leading to a widening gap between data centre traffic demand and electronically-switched data centre network capacity. All-optical switches could offer a future-proof alternative, with potentially under a third of the power consumption and cost of electronically-switched networks. However, the effective bandwidth of optical switches depends on their overall switching time. This is dominated by the clock and data recovery (CDR) locking time, which takes hundreds of nanoseconds in commercial receivers. Current data centre traffic is dominated by small packets that transmit in tens of nanoseconds, leading to low effective bandwidth, as a high proportion of receiver time is spent performing CDR locking instead of receiving data, removing the benefits of optical switching. High-performance optical switching requires sub-nanosecond CDR locking time to overcome this limitation. This thesis proposes, models, and demonstrates clock synchronisation assisted CDR, which can achieve this. This approach uses clock synchronisation to simplify the complexity of CDR versus previous asynchronous approaches. An analytical model of the technique is first derived that establishes its potential viability. Following this, two approaches to clock synchronisation assisted CDR are investigated: 1. Clock phase caching, which uses clock phase storage and regular updates in a 2km intra-building scale data centre network interconnected by single-mode optical fibre. 2. Single calibration clock synchronisation assisted CDR}, which leverages the 20 times lower thermal sensitivity of hollow core optical fibre versus single-mode fibre to synchronise a 100m cluster scale data centre network, with a single initial phase calibration step. Using a real-time FPGA-based optical switch testbed, sub-nanosecond CDR locking time was demonstrated for both approaches
Design of High-Speed SerDes Transceiver for Chip-to-Chip Communications in CMOS Process
With the continuous increase of on-chip computation capacities and exponential growth of data-intensive applications, the high-speed data transmission through serial links has become the backbone for modern communication systems. To satisfy the massive data-exchanging requirement, the data rate of such serial links has been updated from several Gb/s to tens of Gb/s. Currently, the commercial standards such as Ethernet 400GbE, InfiniBand high data rate (HDR), and common electrical interface (CEI)-56G has been developing towards 40+ Gb/s. As the core component within these links, the transceiver chipset plays a fundamental role in balancing the operation speed, power consumption, area occupation, and operation range. Meanwhile, the CMOS process has become the dominant technology in modern transceiver chip fabrications due to its large-scale digital integration capability and aggressive pricing advantage. This research aims to explore advanced techniques that are capable of exploiting the maximum operation speed of the CMOS process, and hence provides potential solutions for 40+ Gb/s CMOS transceiver designs. The major contributions are summarized as follows.
A low jitter ring-oscillator-based injection-locked clock multiplier (RILCM) with a hybrid frequency tracking loop that consists of a traditional phase-locked loop (PLL), a timing-adjusted loop, and a loop selection state-machine is implemented in 65-nm C-MOS process. In the ring voltage-controlled oscillator, a full-swing pseudo-differential delay cell is proposed to lower the device noise to phase noise conversion. To obtain high operation speed and high detection accuracy, a compact timing-adjusted phase detector tightly combined with a well-matched charge pump is designed. Meanwhile, a lock-loss detection and lock recovery is devised to endow the RILCM with a similar lock-acquisition ability as conventional PLL, thus excluding the initial frequency set- I up aid and preventing the potential lock-loss risk. The experimental results show that the figure-of-merit of the designed RILCM reaches -247.3 dB, which is better than previous RILCMs and even comparable to the large-area LC-ILCMs.
The transmitter (TX) and receiver (RX) chips are separately designed and fab- ricated in 65-nm CMOS process. The transmitter chip employs a quarter-rate multi-multiplexer (MUX)-based 4-tap feed-forward equalizer (FFE) to pre-distort the output. To increase the maximum operating speed, a bandwidth-enhanced 4:1 MUX with the capability of eliminating charge-sharing effect is proposed. To produce the quarter-rate parallel data streams with appropriate delays, a compact latch array associated with an interleaved-retiming technique is designed. The receiver chip employs a two-stage continuous-time linear equalizer (CTLE) as the analog front-end and integrates an improved clock data recovery to extract the sampling clocks and retime the incoming data. To automatically balance the jitter tracking and jitter suppression, passive low-pass filters with adaptively-adjusted bandwidth are introduced into the data-sampling path. To optimize the linearity of the phase interpolation, a time-averaging-based compensating phase interpolator is proposed. For equalization, a combined TX-FFE and RX-CTLE is applied to compensate for the channel loss, where a low-cost edge-data correlation-based sign zero-forcing adaptation algorithm is proposed to automatically adjust the TX-FFE’s tap weights. Measurement results show that the fabricated transmitter/receiver chipset can deliver 40 Gb/s random data at a bit error rate of 16 dB loss at the half-baud frequency, while consuming a total power of 370 mW
Energy-efficient wireline transceivers
Power-efficient wireline transceivers are highly demanded by many applications in high performance computation and communication systems. Apart from transferring a wide range of data rates to satisfy the interconnect bandwidth requirement, the transceivers have very tight power budget and are expected to be fully integrated. This thesis explores enabling techniques to implement such transceivers in both circuit and system levels. Specifically, three prototypes will be presented: (1) a 5Gb/s reference-less clock and data recovery circuit (CDR) using phase-rotating phase-locked loop (PRPLL) to conduct phase control so as to break several fundamental trade-offs in conventional receivers; (2) a 4-10.5Gb/s continuous-rate CDR with novel frequency acquisition scheme based on bang-bang phase detector (BBPD) and a ring oscillator-based fractional-N PLL as the low noise wide range DCO in the CDR loop; (3) a source-synchronous energy-proportional link with dynamic voltage and frequency scaling (DVFS) and rapid on/off (ROO) techniques to cut the link power wastage at system level. The receiver/transceiver architectures are highly digital and address the requirements of new receiver architecture development, wide operating range, and low power/area consumption while being fully integrated. Experimental results obtained from the prototypes attest the effectiveness of the proposed techniques
Digital Centric Multi-Gigabit SerDes Design and Verification
Advances in semiconductor manufacturing still lead to ever decreasing feature sizes and constantly allow higher degrees of integration in application specific integrated circuits (ASICs). Therefore the bandwidth requirements on the external interfaces of such systems on chips (SoC) are steadily growing. Yet, as the number of pins on these ASICs is not increasing in the same pace - known as pin limitation - the bandwidth per pin has to be increased.
SerDes (Serializer/Deserializer) technology, which allows to transfer data serially at very high data rates of 25Gbps and more is a key technology to overcome pin limitation and exploit the computing power that can be achieved in todays SoCs. As such SerDes blocks together with the digital logic interfacing them form complex mixed signal systems, verification of performance and functional correctness is very challenging.
In this thesis a novel mixed-signal design methodology is proposed, which tightly couples model and implementation in order to ensure consistency throughout the design cycles and hereby accelerate the overall implementation flow. A tool flow that has been developed is presented, which integrates well into state of the art electronic design automation (EDA) environments and enables the usage of this methodology in practice.
Further, the design space of todays high-speed serial links is analyzed and an architecture is proposed, which pushes complexity into the digital domain in order to achieve robustness, portability between manufacturing processes and scaling with advanced node technologies. The all digital phase locked loop (PLL) and clock data recovery (CDR), which have been developed are described in detail.
The developed design flow was used for the implementation of the SerDes architecture in a 28nm silicon process and proved to be indispensable for future projects
Design of High-Speed Power-Efficient A/D Converters for Wireline ADC-Based Receiver Applications
Serial input/output (I/O) data rates are increasing in order to support the explosion in network traffic driven by big data applications such as the Internet of Things (IoT), cloud computing and etc. As the high-speed data symbol times shrink, this results in an increased amount of inter-symbol interference (ISI) for transmission over both severe low-pass electrical channels and dispersive optical channels. This necessitates increased equalization complexity and consideration of advanced modulation schemes, such as four-level pulse amplitude modulation (PAM-4). Serial links which utilize an analog-to-digital converter (ADC) receiver front-end offer a potential solution, as they enable more powerful and flexible digital signal processing (DSP) for equalization and symbol detection and can easily support advanced modulation schemes. Moreover, the DSP back-end provides robustness to process, voltage, and temperature (PVT) variations, benefits from improved area and power with CMOS technology scaling and offers easy design transfer between different technology nodes and thus improved time-to-market. However, ADC-based receivers generally consume higher power relative to their mixed-signal counterparts because of the significant power consumed by conventional multi-GS/s ADC implementations. This motivates exploration of energy-efficient ADC designs with moderate resolution and very high sampling rates to support data rates at or above 50Gb/s.
This dissertation presents two power-efficient designs of ≥25GS/s time-interleaved ADCs for ADC-based wireline receivers. The first prototype includes the implementation of a 6b 25GS/s time-interleaved multi-bit search ADC in 65nm CMOS with a soft-decision selection algorithm that provides redundancy for relaxed track-and-hold (T/H) settling and improved metastability tolerance, achieving a figure-of-merit (FoM) of 143fJ/conversion step and 1.76pJ/bit for a PAM-4 receiver design. The second prototype features the design of a 52Gb/s PAM-4 ADC-based receiver in 65nm CMOS, where the front-end consists of a 4-stage continuous-time linear equalizer (CTLE)/variable gain amplifier (VGA) and a 6b 26GS/s time-interleaved SAR ADC with a comparator-assisted 2b/stage structure for reduced digital-to-analog converter (DAC) complexity and a 3-tap embedded feed-forward equalizer (FFE) for relaxed ADC resolution requirement. The receiver front-end achieves an efficiency of 4.53bJ/bit, while compensating for up to 31dB loss with DSP and no transmitter (TX) equalization
- …