363 research outputs found
Recommended from our members
Cross-Layer Pathfinding for Off-Chip Interconnects
Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects
An Eight lanes 7Gb/s/pin Source Synchronous Single-Ended RX with Equalization and Far-End Crosstalk Cancellation for Backplane Channels
This paper presents a versatile crosstalk cancellation scheme for single-ended multi-lane backplane links. System-level investigations show that a scheme, which combines analog filters and decision-feedback crosstalk compensation on the receiver (RX) side only, can efficiently remove crosstalk patterns in straight channels as well as boards with reflections due to via stubs. An eight-lane single-ended RX has been manufactured in 32-nm SOI CMOS to validate our findings. A CTLE and eight-tap decision feedback equalizer equalize the channel without transmitter feedforward equalizer. A continuous time crosstalk canceller reduces precursors by nearest neighbors, while the residual postcursors from all aggressors are suppressed by direct feedback 7x8-tap decision-feedback crosstalk canceller (DFXC). Measurements with flip-chip packaged RX show that the RX macro can equalize both a 30-dB insertion loss single-ended channel with 0-dB signal-to-crosstalk at Nyquist and a channel with 28-dB attenuation with the signal-to-crosstalk ratio of 6 dB combined with reflections due to via stubs. The RX operates up to 7 Gb/s/pin with PRBS11 data at bit error rate (BER) <10⁻¹², and occupies 300x350 μm² with an energy efficiency of 5.9 mW/Gb/s from 1-V supply
Source-synchronous I/O Links using Adaptive Interface Training for High Bandwidth Applications
Mobility is the key to the global business which requires people to be always connected to a central server. With the exponential increase in smart phones, tablets, laptops, mobile traffic will soon reach in the range of Exabytes per month by 2018. Applications like
video streaming, on-demand-video, online gaming, social media applications will further increase the traffic load. Future application scenarios, such as Smart Cities, Industry 4.0, Machine-to-Machine (M2M) communications bring the concepts of Internet of Things (IoT) which requires high-speed low power communication infrastructures. Scientific applications, such as space exploration, oil exploration also require computing speed in the range of Exaflops/s by 2018 which means TB/s bandwidth at each memory node. To
achieve such bandwidth, Input/Output (I/O) link speed between two devices needs to be increased to GB/s.
The data at high speed between devices can be transferred serially using complex Clock-Data-Recovery (CDR) I/O links or parallely using simple source-synchronous I/O links. Even though CDR is more efficient than the source-synchronous method for single I/O link, but to achieve TB/s bandwidth from a single device, additional I/O links will be required and the source-synchronous method will be more advantageous in terms of area and power requirements as additional I/O links do not require extra hardware resources. At high speed, there are several non-idealities (Supply noise, crosstalk, Inter-
Symbol-Interference (ISI), etc.) which create unwanted skew problem among parallel source-synchronous I/O links. To solve these problems, adaptive trainings are used in time domain to synchronize parallel source-synchronous I/O links irrespective of these non-idealities.
In this thesis, two novel adaptive training architectures for source-synchronous I/O links are discussed which require significantly less silicon area and power in comparison to state-of-the-art architectures. First novel adaptive architecture is based on the unit delay concept to synchronize two parallel clocks by adjusting the phase of one clock in only one direction. Second novel adaptive architecture concept consists of Phase Interpolator (PI)-based Phase Locked Loop (PLL) which can adjust the phase in both direction and
achieve faster synchronization at the expense of added complexity. With an increase in parallel I/O links, clock skew which is generated by the improper clock tree, also affects the timing margin. Incorrect duty cycle further reduces the timing margin mainly in Double Data Rate (DDR) systems which are generally used to increase the bandwidth of a high-speed communication system. To solve clock skew and duty cycle problems, a novel clock tree buffering algorithm and a novel duty cycle corrector are described which further reduce the power consumption of a source-synchronous system
A 5.9mW/Gb/s 7Gb/s/pin 8-Lane Single-Ended RX with Crosstalk Cancellation Scheme using a XCTLE and 56-tap XDFE in 32nm SOI CMOS
This work reports an 8-lane single-ended RX featuring compact and low power far-end crosstalk (FEXT) cancellation circuits. The RX data-path consists of a cross continuous-time linear equalizer (XCTLE) to remove FEXT by nearest aggressors within the channel bundle. Residual post-cursor FEXT is suppressed by a direct feedback 7x8-tap cross decision feedback equalizer (XDFE). A CTLE and 8-tap DFE equalize single-ended channels with 28dB insertion loss at Nyquist frequency without TX FFE. The circuit, fabricated in 32nm SOI CMOS, was measured to receive 7Gb/s/pin PRBS11 data at BER< 10^-12 with 12.5%UI margin. It occupies 300x350um2 with an energy efficiency of 5.9mW/Gb/s
데이터 전송로 확장성과 루프 선형성을 향상시킨 다중채널 수신기들에 관한 연구
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 정덕균.Two types of serial data communication receivers that adopt a multichannel architecture for a high aggregate I/O bandwidth are presented. Two techniques for collaboration and sharing among channels are proposed to enhance the loop-linearity and channel-expandability of multichannel receivers, respectively.
The first proposed receiver employs a collaborative timing scheme recovery which relies on the sharing of all outputs of phase detectors (PDs) among channels to extract common information about the timing and multilevel signaling architecture of PAM-4. The shared timing information is processed by a common global loop filter and is used to update the phase of the voltage-controlled oscillator with better rejection of per-channel noise. In addition to collaborative timing recovery, a simple linearization technique for binary PDs is proposed. The technique realizes a high-rate oversampling PD while the hardware cost is equivalent to that of a conventional 2x-oversampling clock and data recovery. The first receiver exploiting the collaborative timing recovery architecture is designed using 45-nm CMOS technology. A single data lane occupies a 0.195-mm2 area and consumes a relatively low 17.9 mW at 6 Gb/s at 1.0V. Therefore, the power efficiency is 2.98 mW/Gb/s. The simulated jitter is about 0.034 UI RMS given an input jitter value of 0.03 UI RMS, while the relatively constant loop bandwidth with the PD linearization technique is about 7.3-MHz regardless of the data-stream noise.
Unlike the first receiver, the second proposed multichannel receiver was designed to reduce the hardware complexity of each lane. The receiver employs shared calibration logic among channels and yet achieves superior channel expandability with slim data lanes. A shared global calibration control, which is used in a forwarded clock receiver based on a multiphase delay-locked loop, accomplishes skew calibration, equalizer adaptation, and the phase lock of all channels during a calibration period, resulting in reduced hardware overhead and less area required by each data lane. The
second forwarded clock receiver is designed in 90-nm CMOS technology. It achieves error-free eye openings of more than 0.5 UI across 9− 28 inch Nelco 4000-6 microstrips at 4− 7 Gb/s and more than 0.42 UI at data rates of up to 9 Gb/s. The data lane occupies only 0.152 mm2 and consumes 69.8 mW, while the rest of the receiver occupies 0.297 mm2 and consumes 56 mW at a data rate of 7 Gb/s and a supply voltage of 1.35 V.1. Introduction 1
1.1 Motivations
1.2 Thesis Organization
2. Previous Receivers for Serial-Data Communications
2.1 Classification of the Links
2.2 Clocking architecture of transceivers
2.3 Components of receiver
2.3.1 Channel loss
2.3.2 Equalizer
2.3.3 Clock and data recovery circuit
2.3.3.1. Basic architecture
2.3.3.2. Phase detector
2.3.3.2.1. Linear phase detector
2.3.3.2.2. Binary phase detector
2.3.3.3. Frequency detector
2.3.3.4. Charge pump
2.3.3.5. Voltage controlled oscillator and delay-line
2.3.4 Loop dynamics of PLL
2.3.5 Loop dynamics of DLL
3. The Proposed PLL-Based Receiver with Loop Linearization Technique
3.1 Introduction
3.2 Motivation
3.3 Overview of binary phase detection
3.4 The proposed BBPD linearization technique
3.4.1 Architecture of the proposed PLL-based receiver
3.4.2 Linearization technique of binary phase detection
3.4.3 Rotational pattern of sampling phase offset
3.5 PD gain analysis and optimization
3.6 Loop Dynamics of the 2nd-order CDR
3.7 Verification with the time-accurate behavioral simulation
3.8 Summary
4. The Proposed DLL-Based Receiver with Forwarded-Clock
4.1 Introduction
4.2 Motivation
4.3 Design consideration
4.4 Architecture of the proposed forwarded-clock receiver
4.5 Circuit description
4.5.1 Analog multi-phase DLL
4.5.2 Dual-input interpolating deley cells
4.5.3 Dedicated half-rate data samplers
4.5.4 Cherry-Hooper continuous-time linear equalizer
4.5.5 Equalizer adaptation and phase-lock scheme
4.6 Measurement results
5. Conclusion
6. BibliographyDocto
Towards the Design of Robust High-Speed and Power Efficient Short Reach Photonic Links
In 2014, approximately eight trillion transistors were fabricated every second thanks to improvements in integration density and fabrication processes. This increase in integration and functionality has also brought about the possibility of system on chip (SoC) and high-performance computing (HPC). Electrical interconnects presently dominate the very-short reach interconnect landscape (< 5 cm) in these applications. This, however, is expected to change. These interconnects' downfall will be caused by their need for impedance matching, limited pin-density and frequency dependent loss leading to intersymbol interference. In an attempt to solve this, researchers have increasingly explored integrated silicon photonics as it is compatible with current CMOS processes and creates many possibilities for short-reach applications.
Many see optical interconnects as the high-speed link solution for applications ranging from intra-data center (~200 m) down to module or even chip scales (< 2 cm). The attractive properties of optical interconnects, such as low loss and multiplexing abilities, will enable such things as Exascale high-performance computers of the future (equal to 10^18 calculations per second). In fact, forecasts predict that by 2025 photonics at the smallest levels of the interconnect hierarchy will be a reality. This thesis presents three novel research projects, which all work towards increasing robustness and cost-efficiency in short-reach optical links. It discusses three parts of the optical link: the interconnect, the receiver and the photodiode.
The first topic of this thesis is exploratory work on the use of an optical multiplexing technique, mode-division multiplexing (MDM), to carry multiple data lanes along with a forwarded clock for very short-reach applications. The second topic discussed is a novel reconfigurable CMOS receiver proposed as a method to map a clock signal to an interconnect lane in an MDM source-synchronous link with the lowest optical crosstalk. The receiver is designed as a method to make electronic chips that suit the needs of optical ones. By leveraging the more robust electronic integrated circuit, link solutions can be tuned to meet the needs of photonic chips on a die by die basis. The third topic of this thesis proposes a novel photodetector which uses photonic grating couplers to redirect vertical incident light to the horizontal direction. With this technique, the light is applied along the entire length of a p-n junction to improve the responsivity and speed of the device. Experimental results for this photodetector at 35 Gb/s are published, showing it to be the fastest all-silicon based photodetector reported in the literature at the time of publication
Recommended from our members
High Performance Local Oscillator Design for Next Generation Wireless Communication
Local Oscillator (LO) is an essential building block in modern wireless radios. In modern wireless radios, LO often serves as a reference of the carrier signal to modulate or demod- ulate the outgoing or incoming data. The LO signal should be a clean and stable source, such that the frequency or timing information of the carrier reference can be well-defined. However, as radio architecture evolves, the importance of LO path design has become much more important than before. Of late, many radio architecture innovations have exploited sophisticated LO generation schemes to meet the ever-increasing demands of wireless radio performances.
The focus of this thesis is to address challenges in the LO path design for next-generation high performance wireless radios. These challenges include (1) Congested spectrum at low radio frequency (RF) below 5GHz (2) Continuing miniaturization of integrated wireless radio, and (3) Fiber-fast (>10Gb/s) mm-wave wireless communication.
The thesis begins with a brief introduction of the aforementioned challenges followed by a discussion of the opportunities projected to overcome these challenges.
To address the challenge of congested spectrum at frequency below 5GHz, novel ra- dio architectures such as cognitive radio, software-defined radio, and full-duplex radio have drawn significant research interest. Cognitive radio is a radio architecture that opportunisti- cally utilize the unused spectrum in an environment to maximize spectrum usage efficiency. Energy-efficient spectrum sensing is the key to implementing cognitive radio. To enable energy-efficient spectrum sensing, a fast-hopping frequency synthesizer is an essential build- ing block to swiftly sweep the carrier frequency of the radio across the available spectrum. Chapter 2 of this thesis further highlights the challenges and trade-offs of the current LO gen-
eration scheme for possible use in sweeping LO-based spectrum analysis. It follows by intro- duction of the proposed fast-hopping LO architecture, its implementation and measurement results of the validated prototype. Chapter 3 proposes an embedded phase-shifting LO-path design for wideband RF self-interference cancellation for full-duplex radio. It demonstrates a synergistic design between the LO path and signal to perform self-interference cancellation.
To address the challenge of continuing miniaturization of integrated wireless radio, ring oscillator-based frequency synthesizer is an attractive candidate due to its compactness. Chapter 4 discussed the difficulty associated with implementing a Phase-Locked Loop (PLL) with ultra-small form-factor. It further proposes the concept sub-sampling PLL with time- based loop filter to address these challenges. A 65nm CMOS prototype and its measurement result are presented for validation of the concept.
In shifting from RF to mm-wave frequencies, the performance of wireless communication links is boosted by significant bandwidth and data-rate expansion. However, the demand for data-rate improvement is out-pacing the innovation of radio architectures. A >10Gb/s mm-wave wireless communication at 60GHz is required by emerging applications such as virtual-reality (VR) headsets, inter-rack data transmission at data center, and Ultra-High- Definition (UHD) TV home entertainment systems. Channel-bonding is considered to be a promising technique for achieving >10Gb/s wireless communication at 60GHz. Chapter 5 discusses the fundamental radio implementation challenges associated with channel-bonding for 60GHz wireless communication and the pros and cons of prior arts that attempted to address these challenges. It is followed by a discussion of the proposed 60GHz channel- bonding receiver, which utilizes only a single PLL and enables both contiguous and non- contiguous channel-bonding schemes.
Finally, Chapter 6 presents the conclusion of this thesis
A Low-Power BFSK/OOK Transmitter for Wireless Sensors
In recent years, significant improvements in semiconductor technology have allowed consistent development of wireless chipsets in terms of functionality and form factor. This has opened up a broad range of applications for implantable wireless sensors and telemetry devices in multiple categories, such as military, industrial, and medical uses. The nature of these applications often requires the wireless sensors to be low-weight and energy-efficient to achieve long battery life. Among the various functions of these sensors, the communication block, used to transmit the gathered data, is typically the most power-hungry block. In typical wireless sensor networks, transmission range is below 10 meters and required radiated power is below 1 milliwatt. In such cases, power consumption of the frequency-synthesis circuits prior to the power amplifier of the transmitter becomes significant. Reducing this power consumption is currently the focus of various research endeavors. A popular method of achieving this goal is using a direct-modulation transmitter where the generated carrier is directly modulated with baseband data using simple modulation schemes.
Among the different variations of direct-modulation transmitters, transmitters using unlocked digitally-controlled oscillators and transmitters with injection or resonator-locked oscillators are widely investigated because of their simple structure. These transmitters can achieve low-power and stable operation either with the help of recalibration or by sacrificing tuning capability. In contrast, phase-locked-loop-based (PLL) transmitters are less researched. The PLL uses a feedback loop to lock the carrier to a reference frequency with a programmable ratio and thus achieves good frequency stability and convenient tunability.
This work focuses on PLL-based transmitters. The initial goal of this work is to reduce the power consumption of the oscillator and frequency divider, the two most power-consuming blocks in a PLL. Novel topologies for these two blocks are proposed which achieve ultra-low-power operation. Along with measured performance, mathematical analysis to derive rule-of-thumb design approaches are presented. Finally, the full transmitter is implemented using these blocks in a 130 nanometer CMOS process and is successfully tested for low-power operation
- …