363 research outputs found

    An Eight lanes 7Gb/s/pin Source Synchronous Single-Ended RX with Equalization and Far-End Crosstalk Cancellation for Backplane Channels

    Get PDF
    This paper presents a versatile crosstalk cancellation scheme for single-ended multi-lane backplane links. System-level investigations show that a scheme, which combines analog filters and decision-feedback crosstalk compensation on the receiver (RX) side only, can efficiently remove crosstalk patterns in straight channels as well as boards with reflections due to via stubs. An eight-lane single-ended RX has been manufactured in 32-nm SOI CMOS to validate our findings. A CTLE and eight-tap decision feedback equalizer equalize the channel without transmitter feedforward equalizer. A continuous time crosstalk canceller reduces precursors by nearest neighbors, while the residual postcursors from all aggressors are suppressed by direct feedback 7x8-tap decision-feedback crosstalk canceller (DFXC). Measurements with flip-chip packaged RX show that the RX macro can equalize both a 30-dB insertion loss single-ended channel with 0-dB signal-to-crosstalk at Nyquist and a channel with 28-dB attenuation with the signal-to-crosstalk ratio of 6 dB combined with reflections due to via stubs. The RX operates up to 7 Gb/s/pin with PRBS11 data at bit error rate (BER) <10⁻¹², and occupies 300x350 μm² with an energy efficiency of 5.9 mW/Gb/s from 1-V supply

    Source-synchronous I/O Links using Adaptive Interface Training for High Bandwidth Applications

    Get PDF
    Mobility is the key to the global business which requires people to be always connected to a central server. With the exponential increase in smart phones, tablets, laptops, mobile traffic will soon reach in the range of Exabytes per month by 2018. Applications like video streaming, on-demand-video, online gaming, social media applications will further increase the traffic load. Future application scenarios, such as Smart Cities, Industry 4.0, Machine-to-Machine (M2M) communications bring the concepts of Internet of Things (IoT) which requires high-speed low power communication infrastructures. Scientific applications, such as space exploration, oil exploration also require computing speed in the range of Exaflops/s by 2018 which means TB/s bandwidth at each memory node. To achieve such bandwidth, Input/Output (I/O) link speed between two devices needs to be increased to GB/s. The data at high speed between devices can be transferred serially using complex Clock-Data-Recovery (CDR) I/O links or parallely using simple source-synchronous I/O links. Even though CDR is more efficient than the source-synchronous method for single I/O link, but to achieve TB/s bandwidth from a single device, additional I/O links will be required and the source-synchronous method will be more advantageous in terms of area and power requirements as additional I/O links do not require extra hardware resources. At high speed, there are several non-idealities (Supply noise, crosstalk, Inter- Symbol-Interference (ISI), etc.) which create unwanted skew problem among parallel source-synchronous I/O links. To solve these problems, adaptive trainings are used in time domain to synchronize parallel source-synchronous I/O links irrespective of these non-idealities. In this thesis, two novel adaptive training architectures for source-synchronous I/O links are discussed which require significantly less silicon area and power in comparison to state-of-the-art architectures. First novel adaptive architecture is based on the unit delay concept to synchronize two parallel clocks by adjusting the phase of one clock in only one direction. Second novel adaptive architecture concept consists of Phase Interpolator (PI)-based Phase Locked Loop (PLL) which can adjust the phase in both direction and achieve faster synchronization at the expense of added complexity. With an increase in parallel I/O links, clock skew which is generated by the improper clock tree, also affects the timing margin. Incorrect duty cycle further reduces the timing margin mainly in Double Data Rate (DDR) systems which are generally used to increase the bandwidth of a high-speed communication system. To solve clock skew and duty cycle problems, a novel clock tree buffering algorithm and a novel duty cycle corrector are described which further reduce the power consumption of a source-synchronous system

    A 5.9mW/Gb/s 7Gb/s/pin 8-Lane Single-Ended RX with Crosstalk Cancellation Scheme using a XCTLE and 56-tap XDFE in 32nm SOI CMOS

    Get PDF
    This work reports an 8-lane single-ended RX featuring compact and low power far-end crosstalk (FEXT) cancellation circuits. The RX data-path consists of a cross continuous-time linear equalizer (XCTLE) to remove FEXT by nearest aggressors within the channel bundle. Residual post-cursor FEXT is suppressed by a direct feedback 7x8-tap cross decision feedback equalizer (XDFE). A CTLE and 8-tap DFE equalize single-ended channels with 28dB insertion loss at Nyquist frequency without TX FFE. The circuit, fabricated in 32nm SOI CMOS, was measured to receive 7Gb/s/pin PRBS11 data at BER< 10^-12 with 12.5%UI margin. It occupies 300x350um2 with an energy efficiency of 5.9mW/Gb/s

    데이터 전송로 확장성과 루프 선형성을 향상시킨 다중채널 수신기들에 관한 연구

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 정덕균.Two types of serial data communication receivers that adopt a multichannel architecture for a high aggregate I/O bandwidth are presented. Two techniques for collaboration and sharing among channels are proposed to enhance the loop-linearity and channel-expandability of multichannel receivers, respectively. The first proposed receiver employs a collaborative timing scheme recovery which relies on the sharing of all outputs of phase detectors (PDs) among channels to extract common information about the timing and multilevel signaling architecture of PAM-4. The shared timing information is processed by a common global loop filter and is used to update the phase of the voltage-controlled oscillator with better rejection of per-channel noise. In addition to collaborative timing recovery, a simple linearization technique for binary PDs is proposed. The technique realizes a high-rate oversampling PD while the hardware cost is equivalent to that of a conventional 2x-oversampling clock and data recovery. The first receiver exploiting the collaborative timing recovery architecture is designed using 45-nm CMOS technology. A single data lane occupies a 0.195-mm2 area and consumes a relatively low 17.9 mW at 6 Gb/s at 1.0V. Therefore, the power efficiency is 2.98 mW/Gb/s. The simulated jitter is about 0.034 UI RMS given an input jitter value of 0.03 UI RMS, while the relatively constant loop bandwidth with the PD linearization technique is about 7.3-MHz regardless of the data-stream noise. Unlike the first receiver, the second proposed multichannel receiver was designed to reduce the hardware complexity of each lane. The receiver employs shared calibration logic among channels and yet achieves superior channel expandability with slim data lanes. A shared global calibration control, which is used in a forwarded clock receiver based on a multiphase delay-locked loop, accomplishes skew calibration, equalizer adaptation, and the phase lock of all channels during a calibration period, resulting in reduced hardware overhead and less area required by each data lane. The second forwarded clock receiver is designed in 90-nm CMOS technology. It achieves error-free eye openings of more than 0.5 UI across 9− 28 inch Nelco 4000-6 microstrips at 4− 7 Gb/s and more than 0.42 UI at data rates of up to 9 Gb/s. The data lane occupies only 0.152 mm2 and consumes 69.8 mW, while the rest of the receiver occupies 0.297 mm2 and consumes 56 mW at a data rate of 7 Gb/s and a supply voltage of 1.35 V.1. Introduction 1 1.1 Motivations 1.2 Thesis Organization 2. Previous Receivers for Serial-Data Communications 2.1 Classification of the Links 2.2 Clocking architecture of transceivers 2.3 Components of receiver 2.3.1 Channel loss 2.3.2 Equalizer 2.3.3 Clock and data recovery circuit 2.3.3.1. Basic architecture 2.3.3.2. Phase detector 2.3.3.2.1. Linear phase detector 2.3.3.2.2. Binary phase detector 2.3.3.3. Frequency detector 2.3.3.4. Charge pump 2.3.3.5. Voltage controlled oscillator and delay-line 2.3.4 Loop dynamics of PLL 2.3.5 Loop dynamics of DLL 3. The Proposed PLL-Based Receiver with Loop Linearization Technique 3.1 Introduction 3.2 Motivation 3.3 Overview of binary phase detection 3.4 The proposed BBPD linearization technique 3.4.1 Architecture of the proposed PLL-based receiver 3.4.2 Linearization technique of binary phase detection 3.4.3 Rotational pattern of sampling phase offset 3.5 PD gain analysis and optimization 3.6 Loop Dynamics of the 2nd-order CDR 3.7 Verification with the time-accurate behavioral simulation 3.8 Summary 4. The Proposed DLL-Based Receiver with Forwarded-Clock 4.1 Introduction 4.2 Motivation 4.3 Design consideration 4.4 Architecture of the proposed forwarded-clock receiver 4.5 Circuit description 4.5.1 Analog multi-phase DLL 4.5.2 Dual-input interpolating deley cells 4.5.3 Dedicated half-rate data samplers 4.5.4 Cherry-Hooper continuous-time linear equalizer 4.5.5 Equalizer adaptation and phase-lock scheme 4.6 Measurement results 5. Conclusion 6. BibliographyDocto

    Towards the Design of Robust High-Speed and Power Efficient Short Reach Photonic Links

    Get PDF
    In 2014, approximately eight trillion transistors were fabricated every second thanks to improvements in integration density and fabrication processes. This increase in integration and functionality has also brought about the possibility of system on chip (SoC) and high-performance computing (HPC). Electrical interconnects presently dominate the very-short reach interconnect landscape (< 5 cm) in these applications. This, however, is expected to change. These interconnects' downfall will be caused by their need for impedance matching, limited pin-density and frequency dependent loss leading to intersymbol interference. In an attempt to solve this, researchers have increasingly explored integrated silicon photonics as it is compatible with current CMOS processes and creates many possibilities for short-reach applications. Many see optical interconnects as the high-speed link solution for applications ranging from intra-data center (~200 m) down to module or even chip scales (< 2 cm). The attractive properties of optical interconnects, such as low loss and multiplexing abilities, will enable such things as Exascale high-performance computers of the future (equal to 10^18 calculations per second). In fact, forecasts predict that by 2025 photonics at the smallest levels of the interconnect hierarchy will be a reality. This thesis presents three novel research projects, which all work towards increasing robustness and cost-efficiency in short-reach optical links. It discusses three parts of the optical link: the interconnect, the receiver and the photodiode. The first topic of this thesis is exploratory work on the use of an optical multiplexing technique, mode-division multiplexing (MDM), to carry multiple data lanes along with a forwarded clock for very short-reach applications. The second topic discussed is a novel reconfigurable CMOS receiver proposed as a method to map a clock signal to an interconnect lane in an MDM source-synchronous link with the lowest optical crosstalk. The receiver is designed as a method to make electronic chips that suit the needs of optical ones. By leveraging the more robust electronic integrated circuit, link solutions can be tuned to meet the needs of photonic chips on a die by die basis. The third topic of this thesis proposes a novel photodetector which uses photonic grating couplers to redirect vertical incident light to the horizontal direction. With this technique, the light is applied along the entire length of a p-n junction to improve the responsivity and speed of the device. Experimental results for this photodetector at 35 Gb/s are published, showing it to be the fastest all-silicon based photodetector reported in the literature at the time of publication

    A Low-Power BFSK/OOK Transmitter for Wireless Sensors

    Get PDF
    In recent years, significant improvements in semiconductor technology have allowed consistent development of wireless chipsets in terms of functionality and form factor. This has opened up a broad range of applications for implantable wireless sensors and telemetry devices in multiple categories, such as military, industrial, and medical uses. The nature of these applications often requires the wireless sensors to be low-weight and energy-efficient to achieve long battery life. Among the various functions of these sensors, the communication block, used to transmit the gathered data, is typically the most power-hungry block. In typical wireless sensor networks, transmission range is below 10 meters and required radiated power is below 1 milliwatt. In such cases, power consumption of the frequency-synthesis circuits prior to the power amplifier of the transmitter becomes significant. Reducing this power consumption is currently the focus of various research endeavors. A popular method of achieving this goal is using a direct-modulation transmitter where the generated carrier is directly modulated with baseband data using simple modulation schemes. Among the different variations of direct-modulation transmitters, transmitters using unlocked digitally-controlled oscillators and transmitters with injection or resonator-locked oscillators are widely investigated because of their simple structure. These transmitters can achieve low-power and stable operation either with the help of recalibration or by sacrificing tuning capability. In contrast, phase-locked-loop-based (PLL) transmitters are less researched. The PLL uses a feedback loop to lock the carrier to a reference frequency with a programmable ratio and thus achieves good frequency stability and convenient tunability. This work focuses on PLL-based transmitters. The initial goal of this work is to reduce the power consumption of the oscillator and frequency divider, the two most power-consuming blocks in a PLL. Novel topologies for these two blocks are proposed which achieve ultra-low-power operation. Along with measured performance, mathematical analysis to derive rule-of-thumb design approaches are presented. Finally, the full transmitter is implemented using these blocks in a 130 nanometer CMOS process and is successfully tested for low-power operation
    corecore