14 research outputs found
Design Techniques for High Performance Serial Link Transceivers
Increasing data rates over electrical channels with significant frequency-dependent loss is difficult due to excessive inter-symbol interference (ISI). In order to achieve sufficient link margins at high rates, I/O system designers implement equalization in the transmitters and are motivated to consider more spectrally-efficient modulation formats relative to the common PAM-2 scheme, such as PAM-4 and duobinary.
The first work, reviews when to consider PAM-4 and duobinary formats, as the modulation scheme which yields the highest system margins at a given data rate is a function of the channel loss profile, and presents a 20Gb/s triple-mode transmitter capable of efficiently implementing these three modulation schemes and three-tap feedforward equalization. A statistical link modeling tool, which models ISI, crosstalk, random noise, and timing jitter, is developed to compare the three common modulation formats operating on electrical backplane channel models. In order to improve duobinary modulation efficiency, a low-power quarter-rate duobinary precoder circuit is proposed which provides significant timing margin improvement relative to full-rate precoders.
Also as serial I/O data rates scale above 10 Gb/s, crosstalk between neighboring channels degrades system bit-error rate (BER) performance. The next work presents receive-side circuitry which merges the cancellation of both near-end and far-end crosstalk (NEXT/FEXT) and can automatically adapt to different channel environments and variations in process, voltage, and temperature.
NEXT cancellation is realized with a novel 3-tap FIR filter which combines two traditional FIR filter taps and a continuous-time band-pass filter IIR tap for efficient crosstalk cancellation, with all filter tap coefficients automatically determined via an ondie sign-sign least-mean-square (SS-LMS) adaptation engine. FEXT cancellation is realized by coupling the aggressor signal through a differentiator circuit whose gain is automatically adjusted with a power-detection-based adaptation loop.
In conclusion, the proposed architectures in the transmitter side and receiver side together are to be good solution in the high speed I/O serial links to improve the performance by overcome the physical channel loss and adjacent channel noise as the system becomes complicated
High-Speed Low-Voltage Line Driver for SerDes Applications
The driving factor behind this research was to design & develop a line driver capable of meeting the demanding specifications of the next generation of SerDes devices. In this thesis various line driver topologies were analysed to identify a topology suited for a high-speed low-voltage operating environment.
This thesis starts of by introducing a relatively new high-speed communication Device called SerDes. SerDes is used in wired chip-to-chip communications and operates by converting a parallel data stream in a serial data stream that can be then transmitted at a higher bit rate, existing SerDes devices operate up to 12.5Gbps. A matching SerDes device at the destination will then convert the serial data stream back into a parallel data stream to be read by the destination ASIC. SerDes typically uses a line driver with a differential output. Using a differential line driver increases the resilience to outside sources of noise and reduces the amount of EM radiation produced by transmission.
The focus of this research is to design and develop a line driver that can operate at 40Gbps and can function with a power supply of less than IV. This demanding specification was decided to be an accurate representation of future requirements that a line driver in a SerDes device will have to conform to.
A suitable line driver with a differential output was identified to meet the demanding specifications and was modified so that it can perfonn an equalisation technique called pre-distortion. Two variations of the new topology were outlined and a behavioural model was created for both using Matlab Simulink. The behavioural model for both variants proved the concept, however only one variant maintained its perfomance once the designs were implemented at transistor level in Cadence, using a 65nm CMOS
technology provided by Texas Instruments.
The final line driver design was then converted into a layout design, again using Cadence, and RC parasitics were extracted to perfom a post-layout simulation. The post layout simulation shows that the novel line driver can operate at 40Gbps with a power supply of 1 V - O.8V and has a power consumption of 4.54m W /Gbps. The Deterministic Jitter added by the line driver is 12.9ps
Bridging the gap : an optimization-based framework for fast, simultaneous circuit & system design space exploration
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 107-110).Design of modern mixed signal integrated circuits is becoming increasingly difficult. Continued MOSFET scaling is approaching the global power dissipation limits while increasing transistor variability, thus requiring careful allocation of power and area resources to achieve increasingly more aggressive performance specifications. In this tightly constrained environment traditional iterative system-to-circuit redesign loop, is becoming inefficient. With complex system architectures and circuit specifications approaching technological limits of the process employed, the designers have less room to margin for the overhead of strict system and circuit design interdependencies. Severely constrained modern mixed IC design can take many iterations to converge in such a design flow. This is an expensive and time consuming process. The situation is particularly acute in high-speed links. As an important building block of many systems (high speed I/O, on-chip communication, ...) power efficiency and area footprint are of utmost importance. Design of these systems is challenging in both system and circuit domain. On one hand system architectures are becoming increasingly complex to provide necessary performance increase. On the other, circuit implementation of these increasingly complicated systems is difficult to achieve under tight power and area budget. To bridge this gap between system and circuit design, we formulate a circuit-to-system optimization-driven framework. It is an equation-based description, powered by a human designer. Provided with equation-based model we use fast optimization tools to quickly scout the available design space. Presence of a designer in the flow is invaluable resource enabling significant saving by simplifying the models to capture only the relevant information and constraining the search space to areas where meaningful solutions might be expected to be found.(cont) Thus, the computational effort overhead that plagues the simulation-based design space exploration and design optimization is greatly reduced. The flow is powered by a signomial optimization engine. The key challenge is to bring, from the modeling point of view, very different problems such as circuit design and system design into the realm of an optimization engine that can solve them jointly, thus breaking the re-design loop or at least cutting it shorter. Relying on signomial programming is necessary in order to accurately model all the necessary phenomenons that arise in electrical circuits and at system level. For example, defining regions of operation of transistors under polarization conditions can not be modeled accurately with simpler type of equations. Similarly, calculating the effect of filtering to a signal also requires possibility to handle signomial equations. Thus, signomial programming is necessary yet not fully explored and finding suitable formulation might take some experimenting as we will see in this thesis. Signomial programming, as a general non-convex optimization problem, is still an active research area. Most of the solutions proposed so far involve local convexification of the problem in addition to branch & bound type of search. Furthermore, most of the non-convex problems are solved for one particular system of equations, and general methodology that is reliable and efficient is not known. Thus, a big part the work to be presented in this thesis is detailing how to construct a system formulation that the optimization engine can solve efficiently and reliably. We tested different formulations and their performance measured in terms of parsing and solving speed and accuracy. From these tests we motivate and explain how a series of transformations we introduce improve our formulation and arrive to a well-behaved and reliable form. We show how to apply our design flow in high-speed link design.(cont) By restructuring the traditional design flow we derive system and circuit abstractions. These sub-problems are interfaced through a set of well defined interface variables, which enables code level separation of problem descriptions, thus building a modular and easy to read and maintain system and circuit model. Finally we develop a set of scripts to automate formulating parametrized system level description. We explain how our transformations influence the speed of this process as well as the size of the model produced.by Ranko SredojeviÄ.S.M
Research and design of high-speed advanced analogue front-ends for fibre-optic transmission systems
In the last decade, we have witnessed the emergence of large, warehouse-scale data centres which have enabled new internet-based software applications such as cloud computing, search engines, social media, e-government etc. Such data centres consist of large collections of servers interconnected using short-reach (reach up to a few hundred meters) optical interconnect. Today, transceivers for these applications achieve up to 100Gb/s by multiplexing 10x 10Gb/s or 4x 25Gb/s channels. In the near future however, data centre operators have expressed a need for optical links which can support 400Gb/s up to 1Tb/s. The crucial challenge is to achieve this in the same footprint (same transceiver module) and with similar power consumption as todayās technology. Straightforward scaling of the currently used space or wavelength division multiplexing may be difficult to achieve: indeed a 1Tb/s transceiver would require integration of 40 VCSELs (vertical cavity surface emitting laser diode, widely used for shortāreach optical interconnect), 40 photodiodes and the electronics operating at 25Gb/s in the same module as todayās 100Gb/s transceiver. Pushing the bit rate on such links beyond todayās commercially available 100Gb/s/fibre will require new generations of VCSELs and their driver and receiver electronics. This work looks into a number of stateāof-the-art technologies and investigates their performance restraints and recommends different set of designs, specifically targeting multilevel modulation formats. Several methods to extend the bandwidth using deep submicron (65nm and 28nm) CMOS technology are explored in this work, while also maintaining a focus upon reducing power consumption and chip area. The techniques used were pre-emphasis in rising and falling edges of the signal and bandwidth extensions by inductive peaking and different local feedback techniques. These techniques have been applied to a transmitter and receiver developed for advanced modulation formats such as PAM-4 (4 level pulse amplitude modulation). Such modulation format can increase the throughput per individual channel, which helps to overcome the challenges mentioned above to realize 400Gb/s to 1Tb/s transceivers
Exploration and Design of High Performance Variation Tolerant On-Chip Interconnects
Siirretty Doriast
Design of High-Speed SerDes Transceiver for Chip-to-Chip Communications in CMOS Process
With the continuous increase of on-chip computation capacities and exponential growth of data-intensive applications, the high-speed data transmission through serial links has become the backbone for modern communication systems. To satisfy the massive data-exchanging requirement, the data rate of such serial links has been updated from several Gb/s to tens of Gb/s. Currently, the commercial standards such as Ethernet 400GbE, InfiniBand high data rate (HDR), and common electrical interface (CEI)-56G has been developing towards 40+ Gb/s. As the core component within these links, the transceiver chipset plays a fundamental role in balancing the operation speed, power consumption, area occupation, and operation range. Meanwhile, the CMOS process has become the dominant technology in modern transceiver chip fabrications due to its large-scale digital integration capability and aggressive pricing advantage. This research aims to explore advanced techniques that are capable of exploiting the maximum operation speed of the CMOS process, and hence provides potential solutions for 40+ Gb/s CMOS transceiver designs. The major contributions are summarized as follows.
A low jitter ring-oscillator-based injection-locked clock multiplier (RILCM) with a hybrid frequency tracking loop that consists of a traditional phase-locked loop (PLL), a timing-adjusted loop, and a loop selection state-machine is implemented in 65-nm C-MOS process. In the ring voltage-controlled oscillator, a full-swing pseudo-differential delay cell is proposed to lower the device noise to phase noise conversion. To obtain high operation speed and high detection accuracy, a compact timing-adjusted phase detector tightly combined with a well-matched charge pump is designed. Meanwhile, a lock-loss detection and lock recovery is devised to endow the RILCM with a similar lock-acquisition ability as conventional PLL, thus excluding the initial frequency set- I up aid and preventing the potential lock-loss risk. The experimental results show that the figure-of-merit of the designed RILCM reaches -247.3 dB, which is better than previous RILCMs and even comparable to the large-area LC-ILCMs.
The transmitter (TX) and receiver (RX) chips are separately designed and fab- ricated in 65-nm CMOS process. The transmitter chip employs a quarter-rate multi-multiplexer (MUX)-based 4-tap feed-forward equalizer (FFE) to pre-distort the output. To increase the maximum operating speed, a bandwidth-enhanced 4:1 MUX with the capability of eliminating charge-sharing effect is proposed. To produce the quarter-rate parallel data streams with appropriate delays, a compact latch array associated with an interleaved-retiming technique is designed. The receiver chip employs a two-stage continuous-time linear equalizer (CTLE) as the analog front-end and integrates an improved clock data recovery to extract the sampling clocks and retime the incoming data. To automatically balance the jitter tracking and jitter suppression, passive low-pass filters with adaptively-adjusted bandwidth are introduced into the data-sampling path. To optimize the linearity of the phase interpolation, a time-averaging-based compensating phase interpolator is proposed. For equalization, a combined TX-FFE and RX-CTLE is applied to compensate for the channel loss, where a low-cost edge-data correlation-based sign zero-forcing adaptation algorithm is proposed to automatically adjust the TX-FFEās tap weights. Measurement results show that the fabricated transmitter/receiver chipset can deliver 40 Gb/s random data at a bit error rate of 16 dB loss at the half-baud frequency, while consuming a total power of 370 mW
Recommended from our members
Silicon Photonics for All-Optical Processing and High-Bandwidth-Density Interconnects
Silicon photonics has emerged in recent years as one of the leading technologies poised to enable penetration of optical communications deeper and more intimately into computing systems than ever before. The integration potential of power efficient WDM links at the first level package or even deeper has been a strong driver for the rapid development this field has seen in recent years. The integration of photonic communication modules with very high bandwidth densities and virtually no bandwidth-distance limitations at the short reach regime of high performance computers and data centers has the potential to alleviate many of the bandwidth bottlenecks currently faced by board, rack, and facility levels. While networks on chip for chip multiprocessors (CMP) were initially deemed the target application of silicon photonic components, it has become evident in recent years that the initial lower hanging fruit is the CMP's I/O links to memory as well as other CMPs. The first chapter of the thesis provides more detailed motivation for the integration of silicon photonic modules into compute systems and surveys some of the recent developments in the field. The second chapter then proceeds to detail a technical case study of silicon photonic microring-based WDM links' scalability and power efficiency for these chip I/O applications which could be developed in the intermediate future. The analysis, initiated originally for a workshop on optical and electrical board and rack level interconnects, looks into a detailed model of the optical power budget for such a link capturing both single-channel aspects as well as WDM-operation-related considerations which are unique for a microring physical characteristics. The holistic analysis for the full link captures the wavelength-channel-spacing dependent characteristics, provides some methodologies for device design in the WDM-operation context, and provides performance predictions based on current best-of-class silicon photonic devices. The key results of the analysis are the determination of upper bounds on the aggregate achievable communication bandwidth per link, identifying design trade-offs for bandwidth versus power efficiency, and highlighting the need for continued technological improvements in both laser as well as photodetector technologies to allow acceptable power efficiency operation of such systems.The third chapter, while continuing on the theme silicon photonic high bandwidth density links, proceeds to detail the first experimental demonstration and characterization of an on-chip spatial division multiplexing (SDM) scheme based on microrings for the multiplexing and demultiplexing functionalities. In the context of more forward looking optical network-on-chip environments, SDM-enabled WDM photonic interconnects can potentially achieve superior bandwidth densities per waveguide compared to WDM-only photonic interconnects. The microring-based implementation allows dynamic tuning of the multiplexing and demultiplexing characteristic of the system which allows operation on WDM grid as well device tuning to combat intra-channel crosstalk. The characterization focuses on the first reported power penalty measurements for on-chip silicon photonic SDM link showing minimal penalties achievable with 3 spatial modes concurrently operating on a single waveguide with 10-Gb/s data carried by each mode. The chapter also details the first demonstration of WDM combined with SDM operation with six separate wavelength-and-spatial 10-Gb/s channels with error free operation and low power penalties. The fourth, fifth, and sixth chapters shift in topic from the application of silicon photonics to communication links to the evolving use of silicon waveguides for nonlinear all-optical processing. The unique tight mode confinement in sub-micron cross-sections combined with the high response of silicon have motivated the development of four-wave mixing (FWM)-based processing silicon devices. The key feature of the silicon platform for these nonlinear processing platforms is the ability to finely and uniformly control the dispersive properties of the optical structures in a way that enables completely offsetting the material dispersion and achieve dispersion profiles required for effective parametric interaction of waves in the optical structures. Chapter four primarily introduces and motivates nonlinear processing in communication applications and focuses on recent achievements in non-silicon and silicon FWM platforms. Chapter five describes some of the author's contributions on parametric processing of high speed data in silicon nonlinear devices, with first of a kind demonstrations of wavelength conversion of 160-Gb/s optically time division multiplexed (OTDM) data as well as the wavelength-multicasting of a 320-Gb/s OTDM stream. The chapter then details a methodical characterization and demonstration of several record wavelength conversion experiments of data in silicon with 40-Gb/s data wavelength-converted across more than 100 nm with only 1.4-dB of power penalties as well as the wavelength and format conversion of 10-Gb/s data across up to 168 nm with sensitivity gains stemming from the format conversion of about 2 dB and a residual conversion penalty of only 0.1 dB, achieved by implementing an improved experimental setup. Both experiments highlight the performance uniformity of the conversion process for a wide range of probe-idler detuning settings, showcasing the silicon platform's unique broadband phase matching properties. The sixth chapter presents a slight shift in motivation for parametric processing from traditional telecom-wavelength applications to functionalities developed targeting mid-IR operation. Parametric-processing in the silicon platform at long wavelengths holds large potential for performance improvements due to the elimination of two-photon absorption in silicon at long wavelengths as well as silicon's dispersion engineering capabilities which uniquely position the silicon platform for effective phase matching of significantly wavelength detuned waves. Four-wave mixing signal generation and reception at mid-IR wavelengths are attractive candidates for tunable flexible operation with modulation and detection speeds which are currently only available at telecom wavelengths. With this vision in mind, several contributions detailing extension of FWM functionalities in silicon to operate at wavelengths close to 2 Ī¼m with performance equivalent to much smaller detuning setting measurements. The contributions detail the experimental demonstration of the first silicon optical processing functionalities achieved at such long wavelengths including the wavelength conversion and unicast of 10-Gb/s signals with up to 700 nm of probe-idler detuning, the combined two-stage 10-Gb/s FWM-link in which both data generation and detection at 1900 nm is facilitated by parametric processing in silicon with only 2.1-dB overall penalty, the first ever 40-Gb/s receiver at 1900 nm based on a FWM stage for simultaneous temporal demultiplexing and wavelength conversion, and lastly, the demonstration of a 40-Gb/s FWM-link operation with only 3.6 dB of penalty. The chapter concludes with a short discussion on possible extensions to enable silicon parametric processing at even longer wavelengths targeting the mid-IR spectral transmission window of 3-5 Ī¼m
Belle II Technical Design Report
The Belle detector at the KEKB electron-positron collider has collected
almost 1 billion Y(4S) events in its decade of operation. Super-KEKB, an
upgrade of KEKB is under construction, to increase the luminosity by two orders
of magnitude during a three-year shutdown, with an ultimate goal of 8E35 /cm^2
/s luminosity. To exploit the increased luminosity, an upgrade of the Belle
detector has been proposed. A new international collaboration Belle-II, is
being formed. The Technical Design Report presents physics motivation, basic
methods of the accelerator upgrade, as well as key improvements of the
detector.Comment: Edited by: Z. Dole\v{z}al and S. Un
Hardware and Methods for Scaling Up Quantum Information Experiments
Quantum computation promises to solve presently intractable problems, with hopes of yielding solutions to pressing issues to society. Despite this, current machines are limited to tens of qubits. The field is in a state of continuous scaling, with groups around the world working on all aspects of this problem. The work of this thesis aims to contribute to this effort. It is motivated by the goal of increasing both the speed and bandwidth of experiments conducted within our laboratory. Low-loss radio-frequency multiplexers were characterised at cryogenic temperatures, with some shown to operate at below 7mK. The Analog Devices ADG904 was one of these, and its insertion loss was measured at <0.5dB up to 2GHz. Their heat load was measured, and it was found that a switching speed of 10 MHz with an RF signal power of -30dB dissipates 43uW. Installing these switches yields a benefit over installing extra cabling in our cryostat for a switching speed of up to 2MHz and RF power of -30dBm. A switch matrix was prototyped for cryogenic operation, enabling re-routing of wiring inside a cryostat with a minimally increased thermal load. This could be used to significantly increase the scale of high frequency experiments. This switch has also been embedded within a calibration routine, facilitating measurement of a specific feature of interest at millikelvin temperatures. As the field of quantum engineering scales, such measurements will be crucial to close the loop, providing feedback to fabrication and semiconductor growth efforts. Finally, a rapid-turnaround test rig has been developed which has 32 high frequency and 100 DC lines, enabling tests of significant scale in liquid helium. This reduces the time per experiment at 4.2 K to hours rather than days, enabling tests such as thermal cycling, as well as the evaluation of on-chip structures or active electronics and classical computing hardware; which are all necessary elements of any solid state quantum computing architecture