Search CORE

4,527 research outputs found

데이터 전송로 확장성과 루프 선형성을 향상시킨 다중채널 수신기들에 관한 연구

Author: 유병주
Publication venue: 서울대학교 대학원
Publication date: 01/02/2013
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 정덕균.Two types of serial data communication receivers that adopt a multichannel architecture for a high aggregate I/O bandwidth are presented. Two techniques for collaboration and sharing among channels are proposed to enhance the loop-linearity and channel-expandability of multichannel receivers, respectively. The first proposed receiver employs a collaborative timing scheme recovery which relies on the sharing of all outputs of phase detectors (PDs) among channels to extract common information about the timing and multilevel signaling architecture of PAM-4. The shared timing information is processed by a common global loop filter and is used to update the phase of the voltage-controlled oscillator with better rejection of per-channel noise. In addition to collaborative timing recovery, a simple linearization technique for binary PDs is proposed. The technique realizes a high-rate oversampling PD while the hardware cost is equivalent to that of a conventional 2x-oversampling clock and data recovery. The first receiver exploiting the collaborative timing recovery architecture is designed using 45-nm CMOS technology. A single data lane occupies a 0.195-mm2 area and consumes a relatively low 17.9 mW at 6 Gb/s at 1.0V. Therefore, the power efficiency is 2.98 mW/Gb/s. The simulated jitter is about 0.034 UI RMS given an input jitter value of 0.03 UI RMS, while the relatively constant loop bandwidth with the PD linearization technique is about 7.3-MHz regardless of the data-stream noise. Unlike the first receiver, the second proposed multichannel receiver was designed to reduce the hardware complexity of each lane. The receiver employs shared calibration logic among channels and yet achieves superior channel expandability with slim data lanes. A shared global calibration control, which is used in a forwarded clock receiver based on a multiphase delay-locked loop, accomplishes skew calibration, equalizer adaptation, and the phase lock of all channels during a calibration period, resulting in reduced hardware overhead and less area required by each data lane. The second forwarded clock receiver is designed in 90-nm CMOS technology. It achieves error-free eye openings of more than 0.5 UI across 9− 28 inch Nelco 4000-6 microstrips at 4− 7 Gb/s and more than 0.42 UI at data rates of up to 9 Gb/s. The data lane occupies only 0.152 mm2 and consumes 69.8 mW, while the rest of the receiver occupies 0.297 mm2 and consumes 56 mW at a data rate of 7 Gb/s and a supply voltage of 1.35 V.1. Introduction 1 1.1 Motivations 1.2 Thesis Organization 2. Previous Receivers for Serial-Data Communications 2.1 Classification of the Links 2.2 Clocking architecture of transceivers 2.3 Components of receiver 2.3.1 Channel loss 2.3.2 Equalizer 2.3.3 Clock and data recovery circuit 2.3.3.1. Basic architecture 2.3.3.2. Phase detector 2.3.3.2.1. Linear phase detector 2.3.3.2.2. Binary phase detector 2.3.3.3. Frequency detector 2.3.3.4. Charge pump 2.3.3.5. Voltage controlled oscillator and delay-line 2.3.4 Loop dynamics of PLL 2.3.5 Loop dynamics of DLL 3. The Proposed PLL-Based Receiver with Loop Linearization Technique 3.1 Introduction 3.2 Motivation 3.3 Overview of binary phase detection 3.4 The proposed BBPD linearization technique 3.4.1 Architecture of the proposed PLL-based receiver 3.4.2 Linearization technique of binary phase detection 3.4.3 Rotational pattern of sampling phase offset 3.5 PD gain analysis and optimization 3.6 Loop Dynamics of the 2nd-order CDR 3.7 Verification with the time-accurate behavioral simulation 3.8 Summary 4. The Proposed DLL-Based Receiver with Forwarded-Clock 4.1 Introduction 4.2 Motivation 4.3 Design consideration 4.4 Architecture of the proposed forwarded-clock receiver 4.5 Circuit description 4.5.1 Analog multi-phase DLL 4.5.2 Dual-input interpolating deley cells 4.5.3 Dedicated half-rate data samplers 4.5.4 Cherry-Hooper continuous-time linear equalizer 4.5.5 Equalizer adaptation and phase-lock scheme 4.6 Measurement results 5. Conclusion 6. BibliographyDocto

SNU Open Repository and Archive

통계적 주파수 검출기 기반 기준 주파수를 사용하지 않는 클록 및 데이터 복원 회로의 설계 방법론

Author: 최홍석
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022. 8. 정덕균.In this thesis, a design of a high-speed, power-efficient, wide-range clock and data recovery (CDR) without a reference clock is proposed. A frequency acquisition scheme using a stochastic frequency detector (SFD) based on the Alexander phase detector (PD) is utilized for the referenceless operation. Pat-tern histogram analysis is presented to analyze the frequency acquisition behavior of the SFD and verified by simulation. Based on the information obtained by pattern histogram analysis, SFD using autocovariance is proposed. With a direct-proportional path and a digital integral path, the proposed referenceless CDR achieves frequency lock at all measurable conditions, and the measured frequency acquisition time is within 7μs. The prototype chip has been fabricated in a 40-nm CMOS process and occupies an active area of 0.032 mm2. The proposed referenceless CDR achieves the BER of less than 10-12 at 32 Gb/s and exhibits an energy efficiency of 1.15 pJ/b at 32 Gb/s with a 1.0 V supply.본 논문은 기준 클럭이 없는 고속, 저전력, 광대역으로 동작하는 클럭 및 데이터 복원회로의 설계를 제안한다. 기준 클럭이 없는 동작을 위해서 알렉산더 위상 검출기에 기반한 통계적 주파수 검출기를 사용하는 주파수 획득 방식이 사용된다. 통계적 주파수 검출기의 주파수 추적 양상을 분석하기 위해 패턴 히스토그램 분석 방법론을 제시하였고 시뮬레이션을 통해 검증하였다. 패턴 히스토그램 분석을 통해 얻은 정보를 바탕으로 자기공분산을 이용한 통계적 주파수 검출기를 제안한다. 직접 비례 경로와 디지털 적분 경로를 통해 제안된 기준 클럭이 없는 클럭 및 데이터 복원회로는 모든 측정 가능한 조건에서 주파수 잠금을 달성하는 데 성공하였고, 모든 경우에서 측정된 주파수 추적 시간은 7μs 이내이다. 40-nm CMOS 공정을 이용하여 만들어진 칩은 0.032 mm2의 면적을 차지한다. 제안하는 클럭 및 데이터 복원회로는 32 Gb/s의 속도에서 비트에러율 10-12 이하로 동작하였고, 에너지 효율은 32Gb/s의 속도에서 1.0V 공급전압을 사용하여 1.15 pJ/b을 달성하였다.CHAPTER 1 INTRODUCTION 1 1.1 MOTIVATION 1 1.2 THESIS ORGANIZATION 13 CHAPTER 2 BACKGROUNDS 14 2.1 CLOCKING ARCHITECTURES IN SERIAL LINK INTERFACE 14 2.2 GENERAL CONSIDERATIONS FOR CLOCK AND DATA RECOVERY 24 2.2.1 OVERVIEW 24 2.2.2 JITTER 26 2.2.3 CDR JITTER CHARACTERISTICS 33 2.3 CDR ARCHITECTURES 39 2.3.1 PLL-BASED CDR – WITH EXTERNAL REFERENCE CLOCK 39 2.3.2 DLL/PI-BASED CDR 44 2.3.3 PLL-BASED CDR – WITHOUT EXTERNAL REFERENCE CLOCK 47 2.4 FREQUENCY ACQUISITION SCHEME 50 2.4.1 TYPICAL FREQUENCY DETECTORS 50 2.4.1.1 DIGITAL QUADRICORRELATOR FREQUENCY DETECTOR 50 2.4.1.2 ROTATIONAL FREQUENCY DETECTOR 54 2.4.2 PRIOR WORKS 56 CHAPTER 3 DESIGN OF THE REFERENCELESS CDR USING SFD 58 3.1 OVERVIEW 58 3.2 PROPOSED FREQUENCY DETECTOR 62 3.2.1 MOTIVATION 62 3.2.2 PATTERN HISTOGRAM ANALYSIS 68 3.2.3 INTRODUCTION OF AUTOCOVARIANCE TO STOCHASTIC FREQUENCY DETECTOR 75 3.3 CIRCUIT IMPLEMENTATION 83 3.3.1 IMPLEMENTATION OF THE PROPOSED REFERENCELESS CDR 83 3.3.2 CONTINUOUS-TIME LINEAR EQUALIZER (CTLE) 85 3.3.3 DIGITALLY-CONTROLLED OSCILLATOR (DCO) 87 3.4 MEASUREMENT RESULTS 89 CHAPTER 4 CONCLUSION 99 APPENDIX A DETAILED FREQUENCY ACQUISITION WAVEFORMS OF THE PROPOSED SFD 100 BIBLIOGRAPHY 108 초 록 122박

SNU Open Repository and Archive

Design and implementation of a 10 Gigabit Ethernet XAUI test systems

Author: Kundoor Meghana Reddy
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2006
Field of study

10 Gigabit Ethernet has been standardized (IEEE 802.3ae), and products based on this standard are being deployed to interconnect MANs, WANs, Storage Area Networks, and very high speed LANs. The XAUI portion of the standard is primarily concerned with short range (up to 50 cm) chip-to-chip communication across printed circuit board traces. The UNH-IOL 10 Gigabit Ethernet Consortium, an industry-supported organization, performs PHY layer testing on products using a test system that has been partially implemented on a Xilinx ML321 evaluation board using the Virtex II-Pro FPGA. A new implementation of the 10 Gigabit Ethernet XAUI test system on the existing ML321 evaluation board is presented in this thesis. The new design removes a number of limitations present in the original Xilinx test system, and it adds new features to the existing transmit and receive sub-systems that enable test engineers to expand the range of test cases and analyze them while simultaneously increasing the speed of testing. The new test system also eliminates the need for expensive test instruments

UNH Scholars' Repository

Adaptive Interference Removal for Un-coordinated Radar/Communication Co-existence

Author: Lops Marco
Wang Xiaodong
Zheng Le
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/12/2017
Field of study

Most existing approaches to co-existing communication/radar systems assume that the radar and communication systems are coordinated, i.e., they share information, such as relative position, transmitted waveforms and channel state. In this paper, we consider an un-coordinated scenario where a communication receiver is to operate in the presence of a number of radars, of which only a sub-set may be active, which poses the problem of estimating the active waveforms and the relevant parameters thereof, so as to cancel them prior to demodulation. Two algorithms are proposed for such a joint waveform estimation/data demodulation problem, both exploiting sparsity of a proper representation of the interference and of the vector containing the errors of the data block, so as to implement an iterative joint interference removal/data demodulation process. The former algorithm is based on classical on-grid compressed sensing (CS), while the latter forces an atomic norm (AN) constraint: in both cases the radar parameters and the communication demodulation errors can be estimated by solving a convex problem. We also propose a way to improve the efficiency of the AN-based algorithm. The performance of these algorithms are demonstrated through extensive simulations, taking into account a variety of conditions concerning both the interferers and the respective channel states

arXiv.org e-Print Archive

IRIS Unicas (Università degli Studi di Cassino e del Lazio Meridionale)

Bit error rate estimation methods for QPSK CO-OFDM transmission

Author: Blow Keith J.
Le Son Thai
Mezentsev Vladimir K.
Turitsyn Sergei K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2014
Field of study

Coherent optical orthogonal frequency division multiplexing (CO-OFDM) is an attractive transmission technique to virtually eliminate intersymbol interference caused by chromatic dispersion and polarization-mode dispersion. Design, development, and operation of CO-OFDM systems require simple, efficient, and reliable methods of their performance evaluation. In this paper, we demonstrate an accurate bit error rate estimation method for QPSK CO-OFDM transmission based on the probability density function of the received QPSK symbols. By comparing with other known approaches, including data-aided and nondata-aided error vector magnitude, we show that the proposed method offers the most accurate estimate of the system performance for both single channel and wavelength division multiplexing QPSK CO-OFDM transmission systems

Crossref

Aston Publications Explorer

Power-Proportional Optical Links

Author: Ibn Abbas Abdullah
Publication venue
Publication date: 01/03/2021
Field of study

The continuous increase in data transfer rate in short-reach links, such as chip-to-chip and between servers within a data-center, demands high-speed links. As power efficiency becomes ever more important in these links, power-efficient optical links need to be designed. Power efficiency in a link can be achieved by enabling power-proportional communication over the serial link. In power-proportional links, the power dissipated by a link is proportional to the amount of data communicated. Normally, data-rate demand is not constant, and the peak data-rate is not required all the time. If a link is not adapted according to the data-rate demand, there will be a fixed power dissipation, and the power efficiency of the link will degrade during the sub-maximal link utilization. Adapting links to real-time data-rate requirements reduces power dissipation. Power proportionality is achieved by scaling the power of the serial link linearly with the link utilization, and techniques such as variable data-rate and burst-mode can be adopted for this purpose. Links whose data rate (and hence power dissipation) can be varied in response to system demands are proposed in this work. Past works have presented rapidly reconfigurable bandwidth in variable data-rate receivers, allowing lower power dissipation for lower data-rate operation. However, maintaining synchronization during reconfiguration was not possible since previous approaches have introduced changes in front-end delay when they are reconfigured. This work presents a technique that allows rapid bandwidth adjustment while maintaining a near-constant delay through the receiver suitable for a power-scalable variable data-rate optical link. Measurements of a fabricated integrated circuit (IC) show nearly constant energy per bit across a 2× variation in data rate while introducing less than 10 % of a unit interval (UI) of delay variation. With continuously increasing data communication in data-centers, parallel optical links with ever-increasing per-lane data rates are being used to meet overall throughput demands. Simultaneously, power efficiency is becoming increasingly important for these links since they do not transmit useful data all the time. The burst-mode solution for vertical-cavity surface-emitting laser (VCSEL)-based point-to-point communication can be used to improve links’ energy efficiency during low link activity. The burst-mode technique for VCSEL-based links has not yet been deployed commercially. Past works have presented burst-mode solutions for single-channel receivers, allowing lower power dissipation during low link activity and solutions for fast activation of the receivers. However, this work presents a novel technique that allows rapid activation of a front-end and fast locking of a clock-and-data-recovery (CDR) for a multi-channel parallel link, utilizing opportunities arising from the parallel nature of many VCSEL-based links. The idea has been demonstrated through electrical and optical measurements of a fabricated IC at 10 Gbps, which show fast data detection and activation of the circuitry within 49 UIs while allowing the front-end to achieve better energy efficiency during low link activity. Simulation results are also presented in support of the proposed technique which allows the CDR to lock within 26 UIs from when it is powered on

Concordia University Research Repository

Optimizing Collective Communication for Scalable Scientific Computing and Deep Learning

Author: Li Jiali
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2023
Field of study

In the realm of distributed computing, collective operations involve coordinated communication and synchronization among multiple processing units, enabling efficient data exchange and collaboration. Scientific applications, such as simulations, computational fluid dynamics, and scalable deep learning, require complex computations that can be parallelized across multiple nodes in a distributed system. These applications often involve data-dependent communication patterns, where collective operations are critical for achieving high performance in data exchange. Optimizing collective operations for scientific applications and deep learning involves improving the algorithms, communication patterns, and data distribution strategies to minimize communication overhead and maximize computational efficiency. Within the context of this dissertation, the specific focus is on optimizing the alltoall operation in 3D Fast Fourier Transform (FFT) applications and the allreduce operation in parallel deep learning, particularly on High-Performance Computing (HPC) systems. Advanced communication algorithms and methods are explored and implemented to improve communication efficiency, consequently enhancing the overall performance of 3D FFT applications. Furthermore, this dissertation investigates the identification of performance bottlenecks during collective communication over Horovod on distributed systems. These bottlenecks are addressed by proposing an optimized parallel communication pattern specifically tailored to alleviate the aforementioned limitations during the training phase in distributed deep learning. The objective is to achieve faster convergence and improve the overall training efficiency. Moreover, this dissertation proposes fault tolerance and elastic scaling features for distributed deep learning by leveraging the User-Level Failure Mitigation (ULFM) from Message Passing Interface (MPI). By incorporating ULFM MPI, the dissertation aims to enhance the elastic capabilities of distributed deep learning systems. This approach enables graceful and lightweight handling of failures while facilitating seamless scaling in dynamic computing environments

University of Tennessee, Knoxville: Trace

A 40-Gb/s Quarter-Rate SerDes Transmitter and Receiver Chipset in 65-nm CMOS

Author: Jiang H
Li F
Lv F
Wang Z
Wang Z
Yuan S
Yue S
Zhang C
Zhao F
Zheng X
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

This paper presents a 40-Gb/s transmitter (TX) and receiver (RX) chipset for chip-to-chip communications in a 65-nm CMOS process. The TX implements a quarter-rate multi-multiplexer (MUX)-based four-tap feed-forward equalizer (FFE), where a charge-sharing-effect elimination technique is introduced into the 4:1 MUX to optimize its jitter performance and power efficiency. The RX employs a two-stage continuous-time linear equalizer as the analog front end and integrates a low-cost sign-based zero-forcing engine relying on edge-data correlation to automatically adjust the tap weights of the TX-FFE. By embedding low-pass filters with an adaptively adjusting bandwidth into the data-sampling path and adopting high-linearity compensating phase interpolators, the clock data recovery achieves both high jitter tolerance and low jitter generation. The fabricated TX and RX chipset delivers 40-Gb/s PRBS data at BER 16-dB loss at half-baud frequency, while consuming a total power of 370 mW

LJMU Research Online (Liverpool John Moores University)

Advanced Equalization Techniques for Digital Coherent Optical Receivers

Author: Arlunno Valeria
Publication venue: Technical University of Denmark
Publication date: 01/01/2013
Field of study

Online Research Database In Technology