19 research outputs found
Design Techniques for High Performance Serial Link Transceivers
Increasing data rates over electrical channels with significant frequency-dependent loss is difficult due to excessive inter-symbol interference (ISI). In order to achieve sufficient link margins at high rates, I/O system designers implement equalization in the transmitters and are motivated to consider more spectrally-efficient modulation formats relative to the common PAM-2 scheme, such as PAM-4 and duobinary.
The first work, reviews when to consider PAM-4 and duobinary formats, as the modulation scheme which yields the highest system margins at a given data rate is a function of the channel loss profile, and presents a 20Gb/s triple-mode transmitter capable of efficiently implementing these three modulation schemes and three-tap feedforward equalization. A statistical link modeling tool, which models ISI, crosstalk, random noise, and timing jitter, is developed to compare the three common modulation formats operating on electrical backplane channel models. In order to improve duobinary modulation efficiency, a low-power quarter-rate duobinary precoder circuit is proposed which provides significant timing margin improvement relative to full-rate precoders.
Also as serial I/O data rates scale above 10 Gb/s, crosstalk between neighboring channels degrades system bit-error rate (BER) performance. The next work presents receive-side circuitry which merges the cancellation of both near-end and far-end crosstalk (NEXT/FEXT) and can automatically adapt to different channel environments and variations in process, voltage, and temperature.
NEXT cancellation is realized with a novel 3-tap FIR filter which combines two traditional FIR filter taps and a continuous-time band-pass filter IIR tap for efficient crosstalk cancellation, with all filter tap coefficients automatically determined via an ondie sign-sign least-mean-square (SS-LMS) adaptation engine. FEXT cancellation is realized by coupling the aggressor signal through a differentiator circuit whose gain is automatically adjusted with a power-detection-based adaptation loop.
In conclusion, the proposed architectures in the transmitter side and receiver side together are to be good solution in the high speed I/O serial links to improve the performance by overcome the physical channel loss and adjacent channel noise as the system becomes complicated
Design of Low-Power NRZ/PAM-4 Wireline Transmitters
Rapid growing demand for instant multimedia access in a myriad of digital devices has pushed
the need for higher bandwidth in modern communication hardwares ranging from short-reach (SR)
memory/storage interfaces to long-reach (LR) data center Ethernets. At the same time, comprehensive
design optimization of link system that meets the energy-efficiency is required for mobile
computing and low operational cost at datacenters. This doctoral study consists of design of two
low-swing wireline transmitters featuring a low-power clock distribution and 2-tap equalization in
energy-efficient manners up to 20-Gb/s operation. In spite of the reduced signaling power in the
voltage-mode (VM) transmit driver, the presence of the segment selection logic still diminishes the
power saving benefit.
The first work presents a scalable VM transmitter which offers low static power dissipation
and adopts an impedance-modulated 2-tap equalizer with analog tap control, thereby obviating
driver segmentation and reducing pre-driver complexity and dynamic power. Per-channel quadrature
clock generation with injection-locked oscillators (ILO) allows the generation of rail-to-rail
quadrature clocks. Energy efficiency is further improved with capacitively driven low-swing global
clock distribution and supply scaling at lower data rates, while output eye quality is maintained at
low voltages with automatic phase calibration of the local ILO-generated quarter-rate clocks. A
prototype fabricated in a general purpose 65 nm CMOS process includes a 2 mm global clock
distribution network and two transmitters that support an output swing range of 100-300mV with
up to 12-dB of equalization. The transmitters achieve 8-16 Gb/s operation at 0.65-1.05 pJ/b energy
efficiency.
The second work involves a dual-mode NRZ/PAM-4 differential low-swing voltage-mode (VM)
transmitter. The pulse-selected output multiplexing allows reduction of power supply and deterministic
jitter caused by large on-chip parasitic inherent in the transmission-gate-based multiplexers
in the earlier work. Analog impedance control replica circuits running in the background produce
gate-biasing voltages that control the peaking ratio for 2-tap feed-forward equalization and
PAM-4 symbol levels for high-linearity. This analog control also allows for efficient generation of
the middle levels in PAM-4 operation with good linearity quantified by level separation mismatch
ratio of 95%. In NRZ mode, 2-tap feedforward equalization is configurable in high-performance
controlled-impedance or energy-efficient impedance-modulated settings to provide performance
scalability. Analytic design consideration on dynamic power, data-rate, mismatch, and output
swing brings optimal performance metric on the given technology node. The proof-of-concept
prototype is verified on silicon with 65 nm CMOS process with improved performance in speed
and energy-efficiency owing to double-stack NMOS transistors in the output stage. The transmitter consumes as low as 29.6mW in 20-Gb/s NRZ and 25.5mW in the 28-Gb/s PAM-4 operations
Design of Energy-Efficient A/D Converters with Partial Embedded Equalization for High-Speed Wireline Receiver Applications
As the data rates of wireline communication links increases, channel impairments such as skin effect, dielectric loss, fiber dispersion, reflections and cross-talk become more pronounced. This warrants more interest in analog-to-digital converter (ADC)-based serial link receivers, as they allow for more complex and flexible back-end digital signal processing (DSP) relative to binary or mixed-signal receivers. Utilizing this back-end DSP allows for complex digital equalization and more bandwidth-efficient modulation schemes, while also displaying reduced process/voltage/temperature (PVT) sensitivity. Furthermore, these architectures offer straightforward design translation and can directly leverage the area and power scaling offered by new CMOS technology nodes. However, the power consumption of the ADC front-end and subsequent digital signal processing is a major issue. Embedding partial equalization inside the front-end ADC can potentially result in lowering the complexity of back-end DSP and/or decreasing the ADC resolution requirement, which results in a more energy-effcient receiver. This dissertation presents efficient implementations for multi-GS/s time-interleaved ADCs with partial embedded equalization. First prototype details a 6b 1.6GS/s ADC with a novel embedded redundant-cycle 1-tap DFE structure in 90nm CMOS. The other two prototypes explain more complex 6b 10GS/s ADCs with efficiently embedded feed-forward equalization (FFE) and decision feedback equalization (DFE) in 65nm CMOS. Leveraging a time-interleaved successive approximation ADC architecture, new structures for embedded DFE and FFE are proposed with low power/area overhead. Measurement results over FR4 channels verify the effectiveness of proposed embedded equalization schemes. The comparison of fabricated prototypes against state-of-the-art general-purpose ADCs at similar speed/resolution range shows comparable performances, while the proposed architectures include embedded equalization as well
Energy-Efficient Receiver Design for High-Speed Interconnects
High-speed interconnects are of vital importance to the operation of high-performance computing and communication systems, determining the ultimate bandwidth or data rates at which the information can be exchanged. Optical interconnects and the employment of high-order modulation formats are considered as the solutions to fulfilling the envisioned speed and power efficiency of future interconnects. One common key factor in bringing the success is the availability of energy-efficient receivers with superior sensitivity. To enhance the receiver sensitivity, improvement in the signal-to-noise ratio (SNR) of the front-end circuits, or equalization that mitigates the detrimental inter-symbol interference (ISI) is required. In this dissertation, architectural and circuit-level energy-efficient techniques serving these goals are presented.
First, an avalanche photodetector (APD)-based optical receiver is described, which utilizes non-return-to-zero (NRZ) modulation and is applicable to burst-mode operation. For the purposes of improving the overall optical link energy efficiency as well as the link bandwidth, this optical receiver is designed to achieve high sensitivity and high reconfiguration speed. The high sensitivity is enabled by optimizing the SNR at the front-end through adjusting the APD responsivity via its reverse bias voltage, along with the incorporation of 2-tap feedforward equalization (FFE) and 2-tap decision feedback equalization (DFE) implemented in current-integrating fashion. The high reconfiguration speed is empowered by the proposed integrating dc and amplitude comparators, which eliminate the RC settling time constraints. The receiver circuits, excluding the APD die, are fabricated in 28-nm CMOS technology. The optical receiver achieves bit-error-rate (BER) better than 1E−12 at −16-dBm optical modulation amplitude (OMA), 2.24-ns reconfiguration time with 5-dB dynamic range, and 1.37-pJ/b energy efficiency at 25 Gb/s.
Second, a 4-level pulse amplitude modulation (PAM4) wireline receiver is described, which incorporates continuous time linear equalizers (CTLEs) and a 2-tap direct DFE dedicated to the compensation for the first and second post-cursor ISI. The direct DFE in a PAM4 receiver (PAM4-DFE) is made possible by the proposed CMOS track-and-regenerate slicer. This proposed slicer offers rail-to-rail digital feedback signals with significantly improved clock-to-Q delay performance. The reduced slicer delay relaxes the settling time constraint of the summer circuits and allows the stringent DFE timing constraint to be satisfied. With the availability of a direct DFE employing the proposed slicer, inductor-based bandwidth enhancement and loop-unrolling techniques, which can be power/area intensive, are not required. Fabricated in 28-nm CMOS technology, the PAM4 receiver achieves BER better than 1E−12 and 1.1-pJ/b energy efficiency at 60 Gb/s, measured over a channel with 8.2-dB loss at Nyquist frequency.
Third, digital neural-network-enhanced FFEs (NN-FFEs) for PAM4 analog-to-digital converter (ADC)-based optical interconnects are described. The proposed NN-FFEs employ a custom learnable piecewise linear (PWL) activation function to tackle the nonlinearities with short memory lengths. In contrast to the conventional Volterra equalizers where multipliers are utilized to generate the nonlinear terms, the proposed NN-FFEs leverage the custom PWL activation function for nonlinear operations and reduce the required number of multipliers, thereby improving the area and power efficiencies. Applications in the optical interconnects based on micro-ring modulators (MRMs) are demonstrated with simulation results of 50-Gb/s and 100-Gb/s links adopting PAM4 signaling. The proposed NN-FFEs and the conventional Volterra equalizers are synthesized with the standard-cell libraries in a commercial 28-nm CMOS technology, and their power consumptions and performance are compared. Better than 37% lower power overhead can be achieved by employing the proposed NN-FFEs, in comparison with the Volterra equalizer that leads to similar improvement in the symbol-error-rate (SER) performance.</p
Recommended from our members
Design of Energy-Efficient Equalization and Data Encoding/Decoding Techniques for Wireline Communication Systems
Ever increasing global internet data traffic has driven up the demand for cutting-edge high-speed wireline communication systems including SerDes PHY for various interfaces, interconnects, data centers servers and switches in optical systems. Operating wireline communications at higher data rates leads to signals suffering from greater channel loss and exponential increase in power consumption, mainly caused by a heavier amount of required equalization.
In this dissertation, two distinct methodologies for designing SerDes transceivers are presented: 1) a pulse width modulated (PWM) time-domain feed forward equalizer (FFE) and linearity improvement technique for higher-order pulse amplitude modulation (PAM) including PAM-8, and 2) an inter-symbol interference (ISI)-resilient data encoding and decoding technique with Dicode encoding and error correction logic for low-bandwidth wireline channels, as an alternative strategy for communicating in an energy-efficient way on bandwidth-limited wireline channels without using conventional equalizers or filters.
The first topic is a PAM-8 wireline transceiver with receiver-side pulse-width-modulated (PWM) or time-domain based feed forward equalization (FFE) technique. The receiver converts voltage-modulated signals or PAM signals to PWM signals and processes them using inverter based delay elements having rail to rail voltage swing. Time-to-voltage and voltage-to-time converters are designed to have non-linearity with opposite signs with the aim of achieving higher front-end linearity on the receiver. The proposed PAM-8 transceiver can operate from 12.0 Gb/s to 39.6 Gb/s and compensates 14 dB loss at 6.6 GHz with an efficiency of 8.66 pJ/bit in 65 nm CMOS.
The second topic is an alternative strategy for communicating on bandwidth-limited wireline channels without using conventional equalizers or filters (FFE, DFE, and CTLE): Inter-symbol interference (ISI) resilient Dicode encoding and error correction for low-bandwidth wireline channels. The key observation is that Dicode-encoded data have no consecutive 1s or -1s. With this known information, the error correction logic at the receiver can correct multi-bit errors due to ISI. Implemented in 65 nm CMOS, the proposed digital encoding and decoding approach can achieve BER less than 10−12 while communicating on a channel with an insertion loss of 24.2 dB and 21.4 dB with 2.56 pJ/bit and 2.66 pJ/bit efficiency while operating at 13.6 Gb/s and 16 Gb/s, respectively
Design of High-Speed SerDes Transceiver for Chip-to-Chip Communications in CMOS Process
With the continuous increase of on-chip computation capacities and exponential growth of data-intensive applications, the high-speed data transmission through serial links has become the backbone for modern communication systems. To satisfy the massive data-exchanging requirement, the data rate of such serial links has been updated from several Gb/s to tens of Gb/s. Currently, the commercial standards such as Ethernet 400GbE, InfiniBand high data rate (HDR), and common electrical interface (CEI)-56G has been developing towards 40+ Gb/s. As the core component within these links, the transceiver chipset plays a fundamental role in balancing the operation speed, power consumption, area occupation, and operation range. Meanwhile, the CMOS process has become the dominant technology in modern transceiver chip fabrications due to its large-scale digital integration capability and aggressive pricing advantage. This research aims to explore advanced techniques that are capable of exploiting the maximum operation speed of the CMOS process, and hence provides potential solutions for 40+ Gb/s CMOS transceiver designs. The major contributions are summarized as follows.
A low jitter ring-oscillator-based injection-locked clock multiplier (RILCM) with a hybrid frequency tracking loop that consists of a traditional phase-locked loop (PLL), a timing-adjusted loop, and a loop selection state-machine is implemented in 65-nm C-MOS process. In the ring voltage-controlled oscillator, a full-swing pseudo-differential delay cell is proposed to lower the device noise to phase noise conversion. To obtain high operation speed and high detection accuracy, a compact timing-adjusted phase detector tightly combined with a well-matched charge pump is designed. Meanwhile, a lock-loss detection and lock recovery is devised to endow the RILCM with a similar lock-acquisition ability as conventional PLL, thus excluding the initial frequency set- I up aid and preventing the potential lock-loss risk. The experimental results show that the figure-of-merit of the designed RILCM reaches -247.3 dB, which is better than previous RILCMs and even comparable to the large-area LC-ILCMs.
The transmitter (TX) and receiver (RX) chips are separately designed and fab- ricated in 65-nm CMOS process. The transmitter chip employs a quarter-rate multi-multiplexer (MUX)-based 4-tap feed-forward equalizer (FFE) to pre-distort the output. To increase the maximum operating speed, a bandwidth-enhanced 4:1 MUX with the capability of eliminating charge-sharing effect is proposed. To produce the quarter-rate parallel data streams with appropriate delays, a compact latch array associated with an interleaved-retiming technique is designed. The receiver chip employs a two-stage continuous-time linear equalizer (CTLE) as the analog front-end and integrates an improved clock data recovery to extract the sampling clocks and retime the incoming data. To automatically balance the jitter tracking and jitter suppression, passive low-pass filters with adaptively-adjusted bandwidth are introduced into the data-sampling path. To optimize the linearity of the phase interpolation, a time-averaging-based compensating phase interpolator is proposed. For equalization, a combined TX-FFE and RX-CTLE is applied to compensate for the channel loss, where a low-cost edge-data correlation-based sign zero-forcing adaptation algorithm is proposed to automatically adjust the TX-FFE’s tap weights. Measurement results show that the fabricated transmitter/receiver chipset can deliver 40 Gb/s random data at a bit error rate of 16 dB loss at the half-baud frequency, while consuming a total power of 370 mW
Network-on-Chip
Limitations of bus-based interconnections related to scalability, latency, bandwidth, and power consumption for supporting the related huge number of on-chip resources result in a communication bottleneck. These challenges can be efficiently addressed with the implementation of a network-on-chip (NoC) system. This book gives a detailed analysis of various on-chip communication architectures and covers different areas of NoCs such as potentials, architecture, technical challenges, optimization, design explorations, and research directions. In addition, it discusses current and future trends that could make an impactful and meaningful contribution to the research and design of on-chip communications and NoC systems
Design of High-Speed CMOS Interface Circuits for Optical Communications
학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 정덕균.The bandwidth requirement of wireline communications has increased ex-ponentially because of the ever-increasing demand for data centers and high-performance computing systems. However, it becomes difficult to satisfy the requirement with legacy electrical links which suffer from frequency-dependent losses due to skin effect, dielectric loss, channel reflections, and crosstalk, resulting in a severe bandwidth limitation. In order to overcome this challenge, it is necessary to introduce optical communication technology, which has been mainly used for long-reach communications, such as long-haul net-works and metropolitan area networks, to the medium- and short-reach com-munication systems. However, there still remain important issues to be resolved to facilitate the adoption of the optical technologies. The most critical challeng-es are the energy efficiency and the cost competitiveness as compared to the legacy copper-based electrical communications. One possible solution is silicon photonics that has long been investigated by a number of research groups. De-spite inherent incompatibility of silicon with the photonic world, silicon pho-tonics is promising and is the only solution that can leverage the mature CMOS technologies.
In this thesis, we summarize the current status of silicon photonics and pro-vide the prospect of the optical interconnection. We also present key circuit techniques essential to the implementation of high-speed and low-power optical receivers. And then, we propose optical receiver architectures satisfying the aforementioned requirements with novel circuit techniques.CHAPTER 1 INTRODUCTION 1
1.1 MOTIVATION 1
1.2 THESIS ORGANIZATION 6
CHAPTER 2 BACKGROUND OF OPTICAL COMMUNICATION 7
2.1 OVERVIEW OF OPTICAL LINK 7
2.2 SILICON PHOTONICS 11
2.3 HYBRID INTEGRATION 22
2.4 SILICON-BASED PHOTODIODES 28
2.4.1 BASIC TERMINOLOGY 28
2.4.2 SILICON PD 29
2.4.3 GERMANIUM PD 32
2.4.4 INTEGRATION WITH WAVEGUIDE 33
CHAPTER 3 CIRCUIT TECHNIQUES FOR OPTICAL RECEIVER 35
3.1 BASIS OF TRANSIMPEDANCE AMPLIFIER 35
3.2 TOPOLOGY OF TIA 39
3.2.1 RESISTOR-BASED TIA 39
3.2.2 COMMON-GATE-BASED TIA 41
3.2.3 FEEDBACK-BASED TIA 44
3.2.4 INVERTER-BASED TIA 47
3.2.5 INTEGRATING RECEIVER 48
3.3 BANDWIDTH EXTENSION TECHNIQUES 49
3.3.1 INDUCTOR-BASED TECHNIQUE 49
3.3.2 EQUALIZATION 61
3.4 CLOCK AND DATA RECOVERY CIRCUITS 66
3.4.1 CDR BASIC 66
3.4.2 CDR EXAMPLES 68
CHAPTER 4 LOW-POWER OPTICAL RECEIVER FRONT-END 73
4.1 OVERVIEW 73
4.2 INVERTER-BASED TIA WITH RESISTIVE FEEDBACK 74
4.3 INVERTER-BASED TIA WITH RESISTIVE AND INDUCTIVE FEEDBACK 81
4.4 CIRCUIT IMPLEMENTATION 89
4.5 MEASUREMENT RESULTS 93
CHAPTER 5 BANDWIDTH- AND POWER-SCALABLE OPTICAL RECEIVER FRONT-END 96
5.1 OVERVIEW 96
5.2 BANDWIDTH AND POWER SCALABILITY 97
5.3 GM STABILIZATION 98
5.4 OVERALL BLOCK DIAGRAM OF RECEIVER 104
5.5 MEASUREMENT RESULTS 111
CHAPTER 6 CONCLUSION 118
BIBLIOGRAPHY 120
초 록 131Docto
Recommended from our members
Design techniques for clocking high performance signaling systems
Scaling of CMOS technology has progressed relentlessly for the past several
decades. In order for this unprecedented scaling to benefit the performance of
large digital systems, the communication bandwidth between integrated circuits
(ICs) must scale accordingly. However, interconnect technology does not scale as
aggressively, making communication between chips the major bottleneck in overall
system performance. In addition, supply voltage scaling, increasing device leakage,
and increased noise make existing signaling circuits inefficient and difficult to scale.
In this thesis, both analog and digital enhancement techniques to mitigate
scaling related issues and improve the performance of building blocks used in high-
speed signaling systems are discussed. A digital-to-phase converter (DPC) with a
resolution better than 100 femto-second resolution, a hybrid analog/digital clock
and data recovery (CDR) architecture that improves the tracking range of tra-
ditional CDRs by an order of magnitude, and a digital CDR architecture that
obviates the need for the charge pump and the large area occupying loop filter,
while achieving error-free operation are presented. Measured results obtained from
the prototype chips are presented to illustrate the proposed design techniques.Keywords: CDR, PL
Research and design of high-speed advanced analogue front-ends for fibre-optic transmission systems
In the last decade, we have witnessed the emergence of large, warehouse-scale data centres which have enabled new internet-based software applications such as cloud computing, search engines, social media, e-government etc. Such data centres consist of large collections of servers interconnected using short-reach (reach up to a few hundred meters) optical interconnect. Today, transceivers for these applications achieve up to 100Gb/s by multiplexing 10x 10Gb/s or 4x 25Gb/s channels. In the near future however, data centre operators have expressed a need for optical links which can support 400Gb/s up to 1Tb/s. The crucial challenge is to achieve this in the same footprint (same transceiver module) and with similar power consumption as today’s technology. Straightforward scaling of the currently used space or wavelength division multiplexing may be difficult to achieve: indeed a 1Tb/s transceiver would require integration of 40 VCSELs (vertical cavity surface emitting laser diode, widely used for short‐reach optical interconnect), 40 photodiodes and the electronics operating at 25Gb/s in the same module as today’s 100Gb/s transceiver. Pushing the bit rate on such links beyond today’s commercially available 100Gb/s/fibre will require new generations of VCSELs and their driver and receiver electronics. This work looks into a number of state‐of-the-art technologies and investigates their performance restraints and recommends different set of designs, specifically targeting multilevel modulation formats. Several methods to extend the bandwidth using deep submicron (65nm and 28nm) CMOS technology are explored in this work, while also maintaining a focus upon reducing power consumption and chip area. The techniques used were pre-emphasis in rising and falling edges of the signal and bandwidth extensions by inductive peaking and different local feedback techniques. These techniques have been applied to a transmitter and receiver developed for advanced modulation formats such as PAM-4 (4 level pulse amplitude modulation). Such modulation format can increase the throughput per individual channel, which helps to overcome the challenges mentioned above to realize 400Gb/s to 1Tb/s transceivers