Search CORE

21 research outputs found

고속 시리얼 링크를 위한 고리 발진기를 기반으로 하는 주파수 합성기

Author: 김효준
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022. 8. 정덕균.In this dissertation, major concerns in the clocking of modern serial links are discussed. As sub-rate, multi-standard architectures are becoming predominant, the conventional clocking methodology seems to necessitate innovation in terms of low-cost implementation. Frequency synthesis with active, inductor-less oscillators replacing LC counterparts are reviewed, and solutions for two major drawbacks are proposed. Each solution is verified by prototype chip design, giving a possibility that the inductor-less oscillator may become a proper candidate for future high-speed serial links. To mitigate the high flicker noise of a high-frequency ring oscillator (RO), a reference multiplication technique that effectively extends the bandwidth of the following all-digital phase-locked loop (ADPLL) is proposed. The technique avoids any jitter accumulation, generating a clean mid-frequency clock, overall achieving high jitter performance in conjunction with the ADPLL. Timing constraint for the proper reference multiplication is first analyzed to determine the calibration points that may correct the existent phase errors. The weight for each calibration point is updated by the proposed a priori probability-based least-mean-square (LMS) algorithm. To minimize the time required for the calibration, each gain for the weight update is adaptively varied by deducing a posteriori which error source dominates the others. The prototype chip is fabricated in a 40-nm CMOS technology, and its measurement results verify the low-jitter, high-frequency clock generation with fast calibration settling. The presented work achieves an rms jitter of 177/223 fs at 8/16-GHz output, consuming 12.1/17-mW power. As the second embodiment, an RO-based ADPLL with an analog technique that addresses the high supply sensitivity of the RO is presented. Unlike prior arts, the circuit for the proposed technique does not extort the RO voltage headroom, allowing high-frequency oscillation. Further, the performance given from the technique is robust over process, voltage, and temperature (PVT) variations, avoiding the use of additional calibration hardware. Lastly, a comprehensive analysis of phase noise contribution is conducted for the overall ADPLL, followed by circuit optimizations, to retain the low-jitter output. Implemented in a 40-nm CMOS technology, the frequency synthesizer achieves an rms jitter of 289 fs at 8 GHz output without any injected supply noise. Under a 20-mVrms white supply noise, the ADPLL suppresses supply-noise-induced jitter by -23.8 dB.본 논문은 현대 시리얼 링크의 클락킹에 관여되는 주요한 문제들에 대하여 기술한다. 준속도, 다중 표준 구조들이 채택되고 있는 추세에 따라, 기존의 클라킹 방법은 낮은 비용의 구현의 관점에서 새로운 혁신을 필요로 한다. LC 공진기를 대신하여 능동 소자 발진기를 사용한 주파수 합성에 대하여 알아보고, 이에 발생하는 두가지 주요 문제점과 각각에 대한 해결 방안을 탐색한다. 각 제안 방법을 프로토타입 칩을 통해 그 효용성을 검증하고, 이어서 능동 소자 발진기가 미래의 고속 시리얼 링크의 클락킹에 사용될 가능성에 대해 검토한다. 첫번째 시연으로써, 고주파 고리 발진기의 높은 플리커 잡음을 완화시키기 위해 기준 신호를 배수화하여 뒷단의 위상 고정 루프의 대역폭을 효과적으로 극대화 시키는 회로 기술을 제안한다. 본 기술은 지터를 누적 시키지 않으며 따라서 깨끗한 중간 주파수 클락을 생성시켜 위상 고정 루프와 함께 높은 성능의 고주파 클락을 합성한다. 기준 신호를 성공적으로 배수화하기 위한 타이밍 조건들을 먼저 분석하여 타이밍 오류를 제거하기 위한 방법론을 파악한다. 각 교정 중량은 연역적 확률을 기반으로한 LMS 알고리즘을 통해 갱신되도록 설계된다. 교정에 필요한 시간을 최소화 하기 위하여, 각 교정 이득은 타이밍 오류 근원들의 크기를 귀납적으로 추론한 값을 바탕으로 지속적으로 제어된다. 40-nm CMOS 공정으로 구현된 프로토타입 칩의 측정을 통해 저소음, 고주파 클락을 빠른 교정 시간안에 합성해 냄을 확인하였다. 이는 177/223 fs의 rms 지터를 가지는 8/16 GHz의 클락을 출력한다. 두번째 시연으로써, 고리 발진기의 높은 전원 노이즈 의존성을 완화시키는 기술이 포함된 주파수 합성기가 설계되었다. 이는 고리 발진기의 전압 헤드룸을 보존함으로서 고주파 발진을 가능하게 한다. 나아가, 전원 노이즈 감소 성능은 공정, 전압, 온도 변동에 대하여 민감하지 않으며, 따라서 추가적인 교정 회로를 필요로 하지 않는다. 마지막으로, 위상 노이즈에 대한 포괄적 분석과 회로 최적화를 통하여 주파수 합성기의 저잡음 출력을 방해하지 않는 방법을 고안하였다. 해당 프로토타입 칩은 40-nm CMOS 공정으로 구현되었으며, 전원 노이즈가 인가되지 않은 상태에서 289 fs의 rms 지터를 가지는 8 GHz의 클락을 출력한다. 또한, 20 mVrms의 전원 노이즈가 인가되었을 때에 유도되는 지터의 양을 -23.8 dB 만큼 줄이는 것을 확인하였다.1 Introduction 1 1.1 Motivation 3 1.1.1 Clocking in High-Speed Serial Links 4 1.1.2 Multi-Phase, High-Frequency Clock Conversion 8 1.2 Dissertation Objectives 10 2 RO-Based High-Frequency Synthesis 12 2.1 Phase-Locked Loop Fundamentals 12 2.2 Toward All-Digital Regime 15 2.3 RO Design Challenges 21 2.3.1 Oscillator Phase Noise 21 2.3.2 Challenge 1: High Flicker Noise 23 2.3.3 Challenge 2: High Supply Noise Sensitivity 26 3 Filtering RO Noise 28 3.1 Introduction 28 3.2 Proposed Reference Octupler 34 3.2.1 Delay Constraint 34 3.2.2 Phase Error Calibration 38 3.2.3 Circuit Implementation 51 3.3 IL-ADPLL Implementation 55 3.4 Measurement Results 59 3.5 Summary 63 4 RO Supply Noise Compensation 69 4.1 Introduction 69 4.2 Proposed Analog Closed Loop for Supply Noise Compensation 72 4.2.1 Circuit Implementation 73 4.2.2 Frequency-Domain Analysis 76 4.2.3 Circuit Optimization 81 4.3 ADPLL Implementation 87 4.4 Measurement Results 90 4.5 Summary 98 5 Conclusions 99 A Notes on the 8REF 102 B Notes on the ACSC 105박

SNU Open Repository and Archive

Design and modelling of clock and data recovery integrated circuit in 130 nm CMOS technology for 10 Gb/s serial data communications

Author: Assaad Maher
Publication venue
Publication date: 01/01/2009
Field of study

This thesis describes the design and implementation of a fully monolithic 10 Gb/s phase and frequency-locked loop based clock and data recovery (PFLL-CDR) integrated circuit, as well as the Verilog-A modeling of an asynchronous serial link based chip to chip communication system incorporating the proposed concept. The proposed design was implemented and fabricated using the 130 nm CMOS technology offered by UMC (United Microelectronics Corporation). Different PLL-based CDR circuits topologies were investigated in terms of architecture and speed. Based on the investigation, we proposed a new concept of quarter-rate (i.e. the clocking speed in the circuit is 2.5 GHz for 10 Gb/s data rate) and dual-loop topology which consists of phase-locked and frequency-locked loop. The frequency-locked loop (FLL) operates independently from the phase-locked loop (PLL), and has a highly-desired feature that once the proper frequency has been acquired, the FLL is automatically disabled and the PLL will take over to adjust the clock edges approximately in the middle of the incoming data bits for proper sampling. Another important feature of the proposed quarter-rate concept is the inherent 1-to-4 demultiplexing of the input serial data stream. A new quarter-rate phase detector based on the non-linear early-late phase detector concept has been used to achieve the multi-Giga bit/s speed and to eliminate the need of the front-end data pre-processing (edge detecting) units usually associated with the conventional CDR circuits. An eight-stage differential ring oscillator running at 2.5 GHz frequency center was used for the voltage-controlled oscillator (VCO) to generate low-jitter multi-phase clock signals. The transistor level simulation results demonstrated excellent performances in term of locking speed and power consumption. In order to verify the accuracy of the proposed quarter-rate concept, a clockless asynchronous serial link incorporating the proposed concept and communicating two chips at 10 Gb/s has been modelled at gate level using the Verilog-A language and time-domain simulated

Glasgow Theses Service

Energy-efficient wireline transceivers

Author: Shu Guanghua
Publication venue
Publication date: 01/12/2016
Field of study

Power-efficient wireline transceivers are highly demanded by many applications in high performance computation and communication systems. Apart from transferring a wide range of data rates to satisfy the interconnect bandwidth requirement, the transceivers have very tight power budget and are expected to be fully integrated. This thesis explores enabling techniques to implement such transceivers in both circuit and system levels. Specifically, three prototypes will be presented: (1) a 5Gb/s reference-less clock and data recovery circuit (CDR) using phase-rotating phase-locked loop (PRPLL) to conduct phase control so as to break several fundamental trade-offs in conventional receivers; (2) a 4-10.5Gb/s continuous-rate CDR with novel frequency acquisition scheme based on bang-bang phase detector (BBPD) and a ring oscillator-based fractional-N PLL as the low noise wide range DCO in the CDR loop; (3) a source-synchronous energy-proportional link with dynamic voltage and frequency scaling (DVFS) and rapid on/off (ROO) techniques to cut the link power wastage at system level. The receiver/transceiver architectures are highly digital and address the requirements of new receiver architecture development, wide operating range, and low power/area consumption while being fully integrated. Experimental results obtained from the prototypes attest the effectiveness of the proposed techniques

Illinois Digital Environment for Access to Learning and Scholarship Repository

Hybrid NRZ/Multi-Tone Signaling for High-Speed Low-Power Wireline Transceivers

Author: Gharibdoust Kiarash
Publication venue: Lausanne, EPFL
Publication date: 19/04/2016
Field of study

Over the past few decades, incessant growth of Internet networking traffic and High-Performance Computing (HPC) has led to a tremendous demand for data bandwidth. Digital communication technologies combined with advanced integrated circuit scaling trends have enabled the semiconductor and microelectronic industry to dramatically scale the bandwidth of high-loss interfaces such as Ethernet, backplane, and Digital Subscriber Line (DSL). The key to achieving higher bandwidth is to employ equalization technique to compensate the channel impairments such as Inter-Symbol Interference (ISI), crosstalk, and environmental noise. Therefore, todayâs advanced input/outputs (I/Os) has been equipped with sophisticated equalization techniques to push beyond the uncompensated bandwidth of the system. To this end, process scaling has continually increased the data processing capability and improved the I/O performance over the last 15 years. However, since the channel bandwidth has not scaled with the same pace, the required signal processing and equalization circuitry becomes more and more complicated. Thereby, the energy efficiency improvements are largely offset by the energy needed to compensate channel impairments. In this design paradigm, re-thinking about the design strategies in order to not only satisfy the bandwidth performance, but also to improve power-performance becomes an important necessity. It is well known in communication theory that coding and signaling schemes have the potential to provide superior performance over band-limited channels. However, the choice of the optimum data communication algorithm should be considered by accounting for the circuit level power-performance trade-offs. In this thesis we have investigated the application of new algorithm and signaling schemes in wireline communications, especially for communication between microprocessors, memories, and peripherals. A new hybrid NRZ/Multi-Tone (NRZ/MT) signaling method has been developed during the course of this research. The system-level and circuit-level analysis, design, and implementation of the proposed signaling method has been performed in the frame of this work, and the silicon measurement results have proved the efficiency and the robustness of the proposed signaling methodology for wireline interfaces. In the first part of this work, a 7.5 Gb/s hybrid NRZ/MT transceiver (TRX) for multi-drop bus (MDB) memory interfaces is designed and fabricated in 40 nm CMOS technology. Reducing the complexity of the equalization circuitry on the receiver (RX) side, the proposed architecture achieves 1 pJ/bit link efficiency for a MDB channel bearing 45 dB loss at 2.5 GHz. The measurement results of the first prototype confirm that NRZ/MT serial data TRX can offer an energy-efficient solution for MDB memory interfaces. Motivated by the satisfying results of the first prototype, in the second phase of this research we have exploited the properties of multi-tone signaling, especially orthogonality among different sub-bands, to reduce the effect of crosstalk in high-dense wireline interconnects. A four-channel transceiver has been implemented in a standard CMOS 40 nm technology in order to demonstrate the performance of NRZ/MT signaling in presence of high channel loss and strong crosstalk noise. The proposed system achieves 1 pJ/bit power efficiency, while communicating over a MDB memory channel at 36 Gb/s aggregate data rate

Infoscience - École polytechnique fédérale de Lausanne

Recommended from our members

Heterogeneous Integration on Silicon-Interconnect Fabric using fine-pitch interconnects (≤10 �m)

Author: Jangam SivaChandra
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Today, the ever-growing data-bandwidth demand is pushing the boundaries of the traditional printed circuit board (PCB) based integration schemes. Moreover, with the apparent saturation of semiconductor scaling, commonly called Moore's law, system scaling warrants a paradigm shift in packaging technologies, assembly techniques, and integration methodologies. In this work, a superior alternative to PCBs called the Silicon-Interconnect Fabric (Si-IF) is investigated. The Si-IF is a silicon-based, package-less, fine-pitch, highly scalable, heterogeneous integration platform for wafer-scale systems. In this technology, unpackaged dielets are assembled on the Si-IF at small inter-dielet spacings (≤100 �m) using fine-pitch (≤10 �m) die-to-substrate interconnects. A novel assembly process using a solder-less direct metal-metal (gold-gold and copper-copper) thermal compression bonding was developed. Using this process, sub-10 �m pitch interconnects with a low specific contact resistance of ≤0.7 Ω-�m2 were successfully demonstrated. Because of the tightly packed Si-IF assembly, the communication links between the neighboring dies are short (≤500 �m) with low loss (≤2 dB), comparable to on-chip connections. Consequently, simple buffers can transfer data between dies using a Simple Universal Parallel intERface for chips (SuperCHIPS) at low latency (<30 ps), low energy per bit (≤0.03 pJ/b), and high data-rates (up to 10 Gbps/link), corresponding to an aggregate bandwidth up to 8 Tbps/mm. The benefits of the SuperCHIPS protocol were experimentally demonstrated to provide 5-90X higher data-bandwidth, 8-30X lower latency, and 5-40X lower energy per bit compared to existing integration schemes. This dissertation addresses the assembly technology and communication protocols of the Si-IF technology

eScholarship - University of California

Digital enhancement techniques for fractional-N frequency synthesizers

Author: Elkholy Ahmed Mostafa Mohamed Attia
Publication venue
Publication date: 01/12/2016
Field of study

Meeting the demand for unprecedented connectivity in the era of internet-of-things (IoT) requires extremely energy efficient operation of IoT nodes to extend battery life. Managing the data traffic generated by trillions of such nodes also puts severe energy constraints on the data centers. Clock generators that are essential elements in these systems consume significant power and therefore must be optimized for low power and high performance. The focus of this thesis is on improving the energy efficiency of frequency synthesizers and clocking modules by exploring design techniques at both the architectural and circuit levels. In the first part of this work, a digital fractional-N phase locked loop (FNPLL) that employs a high resolution time-to-digital converter (TDC) and a truly ΔΣ fractional divider to achieve low in-band noise with a wide bandwidth is presented. The fractional divider employs a digital-to-time converter (DTC) to cancel out ΔΣ quantization noise in time domain, thus alleviating TDC dynamic range requirements. The proposed digital architecture adopts a narrow range low-power time-amplifier based TDC (TA-TDC) to achieve sub 1ps resolution. Fabricated in 65nm CMOS process, the prototype PLL achieves better than -106dBc/Hz in-band noise and 3MHz PLL bandwidth at 4.5GHz output frequency using 50MHz reference. The PLL achieves excellent jitter performance of 490fsrms, while consumes only 3.7mW. This translates to the best reported jitter-power figure-of-merit (FoM) of -240.5dB among previously reported FNPLLs. Phase noise performance of ring oscillator based digital FNPLLs is severely compromised by conflicting bandwidth requirements to simultaneously suppress oscillator phase and quantization noise introduced by the TDC, ΔΣ fractional divider, and digital-to-analog converter (DAC). As a consequence, their FoM that quantifies the power-jitter tradeoff is at least 25dB worse than their LC-oscillator based FNPLL counterparts. In the second part of this thesis, we seek to close this performance gap by extending PLL bandwidth using quantization noise cancellation techniques and by employing a dual-path digital loop filter to suppress the detrimental impact of DAC quantization noise. A prototype was implemented in a 65nm CMOS process operating over a wide frequency range of 2.0GHz-5.5GHz using a modified extended range multi-modulus divider with seamless switching. The proposed digital FNPLL achieves 1.9psrms integrated jitter while consuming only 4mW at 5GHz output. The measured in-band phase noise is better than -96 dBc/Hz at 1MHz offset. The proposed FNPLL achieves wide bandwidth up to 6MHz using a 50 MHz reference and its FoM is -228.5dB, which is at about 20dB better than previously reported ring-based digital FNPLLs. In the third part, we propose a new multi-output clock generator architecture using open loop fractional dividers for system-on-chip (SoC) platforms. Modern multi-core processors use per core clocking, where each core runs at its own speed. The core frequency can be changed dynamically to optimize for performance or power dissipation using a dynamic frequency scaling (DFS) technique. Fast frequency switching is highly desirable as long as it does not interrupt code execution; therefore it requires smooth frequency transitions with no undershoots. The second main requirement in processor clocking is the capability of spread spectrum frequency modulation. By spreading the clock energy across a wide bandwidth, the electromagnetic interference (EMI) is dramatically reduced. A conventional PLL clock generation approach suffers from a slow frequency settling and limited spread spectrum modulation capabilities. The proposed open loop fractional divider architecture overcomes the bandwidth limitation in fractional-N PLLs. The fractional divider switches the output frequency instantaneously and provides an excellent spread spectrum performance, where precise and programmable modulation depth and frequency can be applied to satisfy different EMI requirements. The fractional divider has unlimited modulation bandwidth resulting in spread spectrum modulation with no filtering, unlike fractional-N PLL; consequently it achieves higher EMI reduction. A prototype fractional divider was implemented in a 65nm CMOS process, where the measured peak-to-peak jitter is less than 27ps over a wide frequency range from 20MHz to 1GHz. The total power consumption is about 3.2mW for 1GHz output frequency. The all-digital implementation of the divider occupies the smallest area of 0.017mm2 compared to state-of-the-art designs. As the data rate of serial links goes higher, the jitter requirements of the clock generator become more stringent. Improving the jitter performance of conventional PLLs to less than (200fsrms) always comes with a large power penalty (tens of mWs). This is due to the PLL coupled noise bandwidth trade-off, which imposes stringent noise requirements on the oscillator and/or loop components. Alternatively, an injection-locked clock multiplier (ILCM) provides many advantages in terms of phase noise, power, and area compared to classical PLLs, but they suffer from a narrow lock-in range and a high sensitivity to PVT variations especially at a large multiplication factor (N). In the fourth part of this thesis, a low-jitter, low-power LC-based ILCM with a digital frequency-tracking loop (FTL) is presented. The proposed FTL relies on a new pulse gating technique to continuously tune the oscillator's free-running frequency. The FTL ensures robust operation across PVT variations and resolves the race condition existing in injection locked PLLs by decoupling frequency tuning from the injection path. As a result, the phase locking condition is only determined by the injection path. This work also introduces an accurate theoretical large-signal analysis for phase domain response (PDR) of injection locked oscillators (ILOs). The proposed PDR analysis captures the asymmetric nature of ILO's lock-in range, and the impact of frequency error on injection strength and phase noise performance. The proposed architecture and analysis are demonstrated by a prototype fabricated in 65 nm CMOS process with active area of 0.25mm2. The prototype ILCM multiplies the reference frequency by 64 to generate an output clock in the range of 6.75GHz-8.25GHz. A superior jitter performance of 190fsrms is achieved, while consuming only 2.25mW power. This translates to a best FoM of -251dB. Unlike conventional PLLs, ILCMs have been fundamentally limited to only integer-N operation and cannot synthesize fractional-N frequencies. In the last part of this thesis, we extend the merits of ILCMs to fractional-N and overcome this fundamental limitation. We employ DTC-based QNC techniques in order to align injected pulses to the oscillator's zero crossings, which enables it to pull the oscillator toward phase lock, thus realizing a fractional-N ILCM. Fabricated in 65nm CMOS process, a prototype 20-bit fractional-N ILCM with an output range of 6.75GHz-8.25GHz consumes only 3.25mW. It achieves excellent jitter performance of 110fsrms and 175fsrms in integer- and fractional-N modes respectively, which translates to the best-reported FoM in both integer- (-255dB) and fractional-N (-252dB) modes. The proposed fractional-N ILCM also features the first-reported rapid on/off capability, where the transient absolute jitter performance at wake-up is bounded below 4ps after less than 4ns. This demonstrates almost instantaneous phase settling. This unique capability enables tremendous energy saving by turning on the clock multiplier only when needed. This energy proportional operation leverages idle times to save power at the system-level of wireline and wireless transceivers

Illinois Digital Environment for Access to Learning and Scholarship Repository

Clock Generator Circuits for Low-Power Heterogeneous Multiprocessor Systems-on-Chip

Author: Höppner Sebastian
Publication venue
Publication date: 25/07/2013
Field of study

In this work concepts and circuits for local clock generation in low-power heterogeneous multiprocessor systems-on-chip (MPSoCs) are researched and developed. The targeted systems feature a globally asynchronous locally synchronous (GALS) clocking architecture and advanced power management functionality, as for example fine-grained ultra-fast dynamic voltage and frequency scaling (DVFS). To enable this functionality compact clock generators with low chip area, low power consumption, wide output frequency range and the capability for ultra-fast frequency changes are required. They are to be instantiated individually per core. For this purpose compact all digital phase-locked loop (ADPLL) frequency synthesizers are developed. The bang-bang ADPLL architecture is analyzed using a numerical system model and optimized for low jitter accumulation. A 65nm CMOS ADPLL is implemented, featuring a novel active current bias circuit which compensates the supply voltage and temperature sensitivity of the digitally controlled oscillator (DCO) for reduced digital tuning effort. Additionally, a 28nm ADPLL with a new ultra-fast lock-in scheme based on single-shot phase synchronization is proposed. The core clock is generated by an open-loop method using phase-switching between multi-phase DCO clocks at a fixed frequency. This allows instantaneous core frequency changes for ultra-fast DVFS without re-locking the closed loop ADPLL. The sensitivity of the open-loop clock generator with respect to phase mismatch is analyzed analytically and a compensation technique by cross-coupled inverter buffers is proposed. The clock generators show small area (0.0097mm2 (65nm), 0.00234mm2 (28nm)), low power consumption (2.7mW (65nm), 0.64mW (28nm)) and they provide core clock frequencies from 83MHz to 666MHz which can be changed instantaneously. The jitter performance is compliant to DDR2/DDR3 memory interface specifications. Additionally, high-speed clocks for novel serial on-chip data transceivers are generated. The ADPLL circuits have been verified successfully by 3 testchip implementations. They enable efficient realization of future low-power MPSoCs with advanced power management functionality in deep-submicron CMOS technologies.In dieser Arbeit werden Konzepte und Schaltungen zur lokalen Takterzeugung in heterogenen Multiprozessorsystemen (MPSoCs) mit geringer Verlustleistung erforscht und entwickelt. Diese Systeme besitzen eine global-asynchrone lokal-synchrone Architektur sowie Funktionalität zum Power Management, wie z.B. das feingranulare, schnelle Skalieren von Spannung und Taktfrequenz (DVFS). Um diese Funktionalität zu realisieren werden kompakte Taktgeneratoren benötigt, welche eine kleine Chipfläche einnehmen, wenig Verlustleitung aufnehmen, einen weiten Bereich an Ausgangsfrequenzen erzeugen und diese sehr schnell ändern können. Sie sollen individuell pro Prozessorkern integriert werden. Dazu werden kompakte volldigitale Phasenregelkreise (ADPLLs) entwickelt, wobei eine bang-bang ADPLL Architektur numerisch modelliert und für kleine Jitterakkumulation optimiert wird. Es wird eine 65nm CMOS ADPLL implementiert, welche eine neuartige Kompensationsschlatung für den digital gesteuerten Oszillator (DCO) zur Verringerung der Sensitivität bezüglich Versorgungsspannung und Temperatur beinhaltet. Zusätzlich wird eine 28nm CMOS ADPLL mit einer neuen Technik zum schnellen Einschwingen unter Nutzung eines Phasensynchronisierers realisiert. Der Prozessortakt wird durch ein neuartiges Phasenmultiplex- und Frequenzteilerverfahren erzeugt, welches es ermöglicht die Taktfrequenz sofort zu ändern um schnelles DVFS zu realisieren. Die Sensitivität dieses Frequenzgenerators bezüglich Phasen-Mismatch wird theoretisch analysiert und durch Verwendung von kreuzgekoppelten Taktverstärkern kompensiert. Die hier entwickelten Taktgeneratoren haben eine kleine Chipfläche (0.0097mm2 (65nm), 0.00234mm2 (28nm)) und Leistungsaufnahme (2.7mW (65nm), 0.64mW (28nm)). Sie stellen Frequenzen von 83MHz bis 666MHz bereit, welche sofort geändert werden können. Die Schaltungen erfüllen die Jitterspezifikationen von DDR2/DDR3 Speicherinterfaces. Zusätzliche können schnelle Takte für neuartige serielle on-Chip Verbindungen erzeugt werden. Die ADPLL Schaltungen wurden erfolgreich in 3 Testchips erprobt. Sie ermöglichen die effiziente Realisierung von zukünftigen MPSoCs mit Power Management in modernsten CMOS Technologien

Technische Universität Dresden: Qucosa

Recommended from our members

Cross-Layer Pathfinding for Off-Chip Interconnects

Author: Srinivas Vaishnav
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects

eScholarship - University of California

Design and realization of a 2.4 Gbps - 3.2 Gbps clock and data recovery circuit

Author: Gursoy Zafer Ozgur
Gürsoy Zafer Özgür
Publication venue
Publication date: 01/01/2003
Field of study

This thesis presents the design, verification, system integration and the physical realization of a high-speed monolithic phase-locked loop (PLL) based clock and data recovery (CDR) circuit. The architecture of the CDR has been realized as a two-loop structure consisting of coarse and fine loops, each of which is capable of processing the incoming low-speed reference clock and high-speed random data. At start up, the coarse loop provides fast locking to the system frequency with the help of the reference clock. After the VCO clock reaches a proximity of system frequency , the LOCK signal is generated and the coarse loop is tumed off, while the fine loop is tumed on. Fine loop tracks the phase of the generated clock with respect to the data and aligns the VCO clock such that its rising edge is in the middle of data eye. The speed and symmetry of sub-blocks in fine loop are extremely important, since all asymmetric charging effects, skew and setup/hold problems in this loop translate into a static phase error at the clock output. The entire circuit architecture is built with a special low-voltage circuit design technique. All analogue as well as digital sub-blocks of the CDR architecture presented in this work operate on a differential signalling, which significantly makes the design more complex while ensuring a more robust perforrnance. Other important features of this CDR include small area, single power supply, low power consumption, capability to operate at very high data rates, and the ability to handle between 2.4 Gbps and 3.2 Gbps data rate. The CDR architecture was realized using a conventional 0.13-mikrometer digital CMOS technology (Foundry: UMC), which ensures a lower overall cost and better portability for the design. The CDR architecture presented in this work is capable of operating at sampling frequencies of up to 3.2 GHz, and still can achieve the robust phase alignrnent. The entire circuit is designed with single 1.2 V power supply .The overall power consumption is estimated as 18.6 mW at 3.2 GHz sampling rate. The overall silicon area of the CDR is approximately 0.3 mm^2 with its internal loop filter capacitors. Other researchers have reported similar featured PLL-based clock and data recovery circuits in terms of operating data rate, architecture and jitter performance. To the best of our knowledge, this clock recovery uses the advantage of being the first high-speed CDR designed in CMOS 0.13 mikrometer technology with the superiority on power consumption and area considerations among others. The CDR architecture presented in this thesis is intended, as a state-of-the-art clock recovery for high-speed applications such as optical communications or high bandwidth serial wireline communication needs. It can be used either as a stand-alone single-chip unit, or as an embedded intellectual property (IP) block that can be integrated with other modules on chip

Sabanci University Research Database

데이터 전송로 확장성과 루프 선형성을 향상시킨 다중채널 수신기들에 관한 연구

Author: 유병주
Publication venue: 서울대학교 대학원
Publication date: 01/02/2013
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2013. 2. 정덕균.Two types of serial data communication receivers that adopt a multichannel architecture for a high aggregate I/O bandwidth are presented. Two techniques for collaboration and sharing among channels are proposed to enhance the loop-linearity and channel-expandability of multichannel receivers, respectively. The first proposed receiver employs a collaborative timing scheme recovery which relies on the sharing of all outputs of phase detectors (PDs) among channels to extract common information about the timing and multilevel signaling architecture of PAM-4. The shared timing information is processed by a common global loop filter and is used to update the phase of the voltage-controlled oscillator with better rejection of per-channel noise. In addition to collaborative timing recovery, a simple linearization technique for binary PDs is proposed. The technique realizes a high-rate oversampling PD while the hardware cost is equivalent to that of a conventional 2x-oversampling clock and data recovery. The first receiver exploiting the collaborative timing recovery architecture is designed using 45-nm CMOS technology. A single data lane occupies a 0.195-mm2 area and consumes a relatively low 17.9 mW at 6 Gb/s at 1.0V. Therefore, the power efficiency is 2.98 mW/Gb/s. The simulated jitter is about 0.034 UI RMS given an input jitter value of 0.03 UI RMS, while the relatively constant loop bandwidth with the PD linearization technique is about 7.3-MHz regardless of the data-stream noise. Unlike the first receiver, the second proposed multichannel receiver was designed to reduce the hardware complexity of each lane. The receiver employs shared calibration logic among channels and yet achieves superior channel expandability with slim data lanes. A shared global calibration control, which is used in a forwarded clock receiver based on a multiphase delay-locked loop, accomplishes skew calibration, equalizer adaptation, and the phase lock of all channels during a calibration period, resulting in reduced hardware overhead and less area required by each data lane. The second forwarded clock receiver is designed in 90-nm CMOS technology. It achieves error-free eye openings of more than 0.5 UI across 9− 28 inch Nelco 4000-6 microstrips at 4− 7 Gb/s and more than 0.42 UI at data rates of up to 9 Gb/s. The data lane occupies only 0.152 mm2 and consumes 69.8 mW, while the rest of the receiver occupies 0.297 mm2 and consumes 56 mW at a data rate of 7 Gb/s and a supply voltage of 1.35 V.1. Introduction 1 1.1 Motivations 1.2 Thesis Organization 2. Previous Receivers for Serial-Data Communications 2.1 Classification of the Links 2.2 Clocking architecture of transceivers 2.3 Components of receiver 2.3.1 Channel loss 2.3.2 Equalizer 2.3.3 Clock and data recovery circuit 2.3.3.1. Basic architecture 2.3.3.2. Phase detector 2.3.3.2.1. Linear phase detector 2.3.3.2.2. Binary phase detector 2.3.3.3. Frequency detector 2.3.3.4. Charge pump 2.3.3.5. Voltage controlled oscillator and delay-line 2.3.4 Loop dynamics of PLL 2.3.5 Loop dynamics of DLL 3. The Proposed PLL-Based Receiver with Loop Linearization Technique 3.1 Introduction 3.2 Motivation 3.3 Overview of binary phase detection 3.4 The proposed BBPD linearization technique 3.4.1 Architecture of the proposed PLL-based receiver 3.4.2 Linearization technique of binary phase detection 3.4.3 Rotational pattern of sampling phase offset 3.5 PD gain analysis and optimization 3.6 Loop Dynamics of the 2nd-order CDR 3.7 Verification with the time-accurate behavioral simulation 3.8 Summary 4. The Proposed DLL-Based Receiver with Forwarded-Clock 4.1 Introduction 4.2 Motivation 4.3 Design consideration 4.4 Architecture of the proposed forwarded-clock receiver 4.5 Circuit description 4.5.1 Analog multi-phase DLL 4.5.2 Dual-input interpolating deley cells 4.5.3 Dedicated half-rate data samplers 4.5.4 Cherry-Hooper continuous-time linear equalizer 4.5.5 Equalizer adaptation and phase-lock scheme 4.6 Measurement results 5. Conclusion 6. BibliographyDocto

SNU Open Repository and Archive