5 research outputs found
Recommended from our members
Modeling and analysis of spur structure of digital-to-time conversion based frequency synthesizers
Frequency synthesizers are critical components of all communication systems. This thesis considers the issue of undesirable frequency spurs of a relatively recent type of frequency synthesis architecture called digital-to-time conversion (DTC). The DTC-based frequency synthesis architecture has important performance benefits over older frequency synthesizers, such as fast frequency switching, large frequency range and fine frequency resolution. A DTC-based frequency synthesizer requires less power than a traditional direct synthesis based synthesizer with comparable frequency range, resolution and switching time. The DTC architecture is also easily scalable to newer low-cost digital complementary metal-oxide-semiconductor (CMOS) integrated circuit (IC) fabrication technologies. However, the DTC architecture suffers from an important undesirable characteristic: sub-harmonic spurious tones, hereafter, referred to as spurs. Spurs have undesirable effects in both the transmitter and the receiver. In a transmitter, spurs create an out-of-band emission of power that may breach the spectral emission mask set by regulatory agencies to enable co-existence of multiple transmitters in a crowded frequency spectrum. In a receiver, an inopportune-located spur in the local oscillator (LO) signal can mix an out-of-band strong interfering signal into the baseband on top of a mixed-down weak desirable signal. Unlike harmonic spurs that are known to be at multiples of the carrier frequency, sub-harmonic spurs are especially problematic as they have been difficult to predict as part of the design process. In fact, the spur patterns for most pairs of closely placed desired output frequencies for a DTC-based frequency synthesizer are seemingly unrelated. While one output frequency setting might have an output spectrum with only a few spurs, many other close-by output frequency settings might have output spectra with many weaker spurs.
The primary contribution of this thesis is the development of spur creation models and analysis tools that can predict spur spectrum and spur power levels for a DTC-based frequency synthesizer. This is an important contribution for assuring achievable performance of frequency synthesizer during the design process. The modeling approach has been successful in accounting of more than 99% of spur spectral locations. Predicted power levels for more than 95% are within 10 dB of actual fabricated DTC-based frequency synthesizer ICs. The results developed in this thesis allow for an understanding of the relationship between spur patterns for different selected output frequencies.
In the research reported in this thesis, the spur spectrum for a selected output frequency is shown to be due to periodic occurrences of errors in the locations of rising and falling edges of the output signal. Error sequences for different selected output frequencies are shown to be related in a way that can be exploited by application of the axis-scaling property of the Discrete Fourier Transform (DFT). The axis-scaling property of the DFT relates the transforms of two sets of sequences that are predictably permutated versions of each other. Their respective transforms are also (differently) permutated versions of each other. One key insight made in this thesis is the discovery that the time-domain errors for all output frequencies can be classified into a very small number of error sequence classes. All error sequences within a class are shown to be predictable permutations of each other. This insight along with the DFT axis-scaling property permits the respective spur spectra to be classified into error spectra classes. All error spectra within a spur spectra class are predictable permutations of each other. There are two sources of edge errors: quantization error and buffer delay errors. This classification of spur spectra to a few classes is shown to be possible for both sources of errors. In this thesis, the case of quantization-only error is considered first. The analysis is then extended to the case when both sources of error are present.
As a result of the modeling and analytical techniques developed for spur spectra classification described in this thesis, design tools have been created to predict the spur spectra of DTC-based synthesizer designs for all possible selected output frequencies
비디오 클럭 주파수 보상 구조를 이용한 디스플레이포트 수신단 설계
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 8. 정덕균.This thesis presents the design of DisplayPort receiver which is a high speed digital display interface replacing existing interfaces such as DVI, HDMI, LVDS and so on. The two prototype chips are fabricated, one is a 5.4/2.7/1.62-Gb/s multi-rate DisplayPort receiver and the other is a 2.7/1.62-Gb/s multi-rate Embedded DisplayPort (eDP) receiver for an intra-panel display interface.
The first receiver which is designed to support the external box-to-box display connection provides up to 4K resolution (4096×2160) with the maximum data rate of 21.6 Gb/s when 4 lanes are all used. The second one aims to connect internal chip-to-chip connection such as graphic processors to display panels in notebooks or tablet PCs. It supports the maximum data rate of 10.8 Gb/s with 4-lane operation which is able to provide the resolution of WQXGA (2560×1600). Since there is no dedicated clock channel, it must contain clock and data recovery (CDR) circuit to extract the link clock from the data stream. All-Digital CDR (ADCDR) is adopted for area efficiency and better performances of the multi-rate operation. The link rate is fixed but the video clock frequency range is fairly wide for supporting all display resolutions and frame rates. Thus, the wide range video clock frequency synthesizer is essential for reconstructing the transmitted video data.
A source device starts link training before transmitting video data to recover the clock and establish the link. When the loss of synchronization between the source device and the sink device happens, it usually restarts the link training and try to re-establish the link. Since link training spends several milliseconds for initializing, the video image is not displayed properly in the sink device during this interval. The proposed clock recovery scheme can significantly shorten the time to recover from the link failure with the ADCDR topology. Once the link is established after link training, the ADCDR memorizes the DCO codes of the synchronization state and when the loss of synchronization happens, it restores the previous DCO code so that the clock is quickly recovered from the failure state without the link re-training.
The direct all-digital frequency synthesizer is proposed to generate the cycle-accurate video clock frequency. The video clock frequency has wide range to cover all display formats and is determined by the division ratio of large M and N values. The proposed frequency synthesizer using a programmable integer divider and a multi-phase switching fractional divider with the delta-sigma modulation exhibits better performances and reduces the design complexity operating with the existing clock from the ADCDR circuit. In asynchronous clock system, the transmitted M value which changes over time is measured by using a counter running with the long reference period (N cycles) and updated once per blank period. Thus, the transmitted M is not accurate due to its low update rate, transport latency and quantization error. The proposed frequency error compensation scheme resolves these problems by monitoring the status of FIFO between the clock domains.
The first prototype chip is fabricated in a 65-nm CMOS process and the physical layer occupies 1.39 mm2 and the estimated area of the link layer is 2.26 mm2. The physical layer dissipates 86/101/116 mW at 1.62/2.7/5.4 Gb/s data rate with all 4-lane operation. The power consumption of the link layer is 107/145/167 mW at 1.62/2.7/5.4 Gb/s. The second prototype chip, fabricated in a 0.13μm CMOS process, presents the physical layer area of 1.59 mm2 and the link layer area of 3.01 mm2. The physical layer dissipates 21 mW at 1.62 Gb/s and 29 mW at 2.7 Gb/s with 2-lane operation. The power consumption of the link layer is 31 mW at 1.62 Gb/s and 41 mW at 2.7 Gb/s with 2-lane operation. The core area of the video clock synthesizer occupies 0.04 mm2 and the power dissipation is 5.5 mW at a low bit rate and 9.1 mW at a high bit rate. The output frequency range is 25 to 330 MHz.ABSTRACT I
CONTENTS IV
LIST OF FIGURES VII
LIST OF TABLES XII
CHAPTER 1 INTRODUCTION 1
1.1 BACKGROUND 1
1.2 MOTIVATION 4
1.3 THESIS ORGANIZATION 12
CHAPTER 2 DIGITAL DISPLAY INTERFACE 13
2.1 OVERVIEW 13
2.2 DISPLAYPORT INTERFACE CHARACTERISTICS 18
2.2.1 DISPLAYPORT VERSION 1.2 18
2.2.2 EMBEDDED DISPLAYPORT VERSION 1.2 21
2.3 DISPLAYPORT INTERFACE ARCHITECTURE 23
2.3.1 LAYERED ARCHITECTURE 23
2.3.2 MAIN STREAM PROTOCOL 27
2.3.3 INITIALIZATION AND LINK TRAINING 30
2.3.3 VIDEO STREAM CLOCK RECOVERY 35
CHAPTER 3 DESIGN OF DISPLAYPORT RECEIVER 39
3.1 OVERVIEW 39
3.2 PHYSICAL LAYER 43
3.3 LINK LAYER 55
3.3.1 OVERALL ARCHITECTURE 55
3.3.2 AUX CHANNEL 58
3.3.3 VIDEO TIMING GENERATION 61
3.3.4 CONTENT PROTECTION 63
3.3.5 AUDIO TRANSMISSION 66
3.4 EXPERIMENTAL RESULTS 68
CHAPTER 4 DESIGN OF EMBEDDED DISPLAYPORT RECEIVER 81
4.1 OVERVIEW 81
4.2 PHYSICAL LAYER 84
4.3 LINK LAYER 88
4.3.1 OVERALL ARCHITECTURE 88
4.3.2 MAIN LINK STREAM 90
4.3.3 CONTENT PROTECTION 93
4.4 PROPOSED CLOCK RECOVERY SCHEME 94
4.5 EXPERIMENTAL RESULTS 100
CHAPTER 5 PROPOSED VIDEO CLOCK SYNTHESIZER AND FREQUENCY CONTROL SCHEME 113
5.1 MOTIVATION 113
5.2 PROPOSED VIDEO CLOCK SYNTHESIZER 115
5.3 BUILDING BLOCKS 121
5.4 FREQUENCY ERROR COMPENSATION 126
5.5 EXPERIMENTAL RESULTS 131
CHAPTER 6 CONCLUSION 138
BIBLIOGRAPHY 141
초 록 152Docto
Clock Generator Circuits for Low-Power Heterogeneous Multiprocessor Systems-on-Chip
In this work concepts and circuits for local clock generation in low-power heterogeneous multiprocessor systems-on-chip (MPSoCs) are researched and developed. The targeted systems feature a globally asynchronous locally synchronous (GALS) clocking architecture and advanced power management functionality, as for example fine-grained ultra-fast dynamic voltage and frequency scaling (DVFS). To enable this functionality compact clock generators with low chip area, low power consumption, wide output frequency range and the capability for ultra-fast frequency changes are required. They are to be instantiated individually per core.
For this purpose compact all digital phase-locked loop (ADPLL) frequency synthesizers are developed. The bang-bang ADPLL architecture is analyzed using a numerical system model and optimized for low jitter accumulation. A 65nm CMOS ADPLL is implemented, featuring a novel active current bias circuit which compensates the supply voltage and temperature sensitivity of the digitally controlled oscillator (DCO) for reduced digital tuning effort. Additionally, a 28nm ADPLL with a new ultra-fast lock-in scheme based on single-shot phase synchronization is proposed.
The core clock is generated by an open-loop method using phase-switching between multi-phase DCO clocks at a fixed frequency. This allows instantaneous core frequency changes for ultra-fast DVFS without re-locking the closed loop ADPLL. The sensitivity of the open-loop clock generator with respect to phase mismatch is analyzed analytically and a compensation technique by cross-coupled inverter buffers is proposed.
The clock generators show small area (0.0097mm2 (65nm), 0.00234mm2 (28nm)), low power consumption (2.7mW (65nm), 0.64mW (28nm)) and they provide core clock frequencies from 83MHz to 666MHz which can be changed instantaneously. The jitter performance is compliant to DDR2/DDR3 memory interface specifications.
Additionally, high-speed clocks for novel serial on-chip data transceivers are generated. The ADPLL circuits have been verified successfully by 3 testchip implementations. They enable efficient realization of future low-power MPSoCs with advanced power management functionality in deep-submicron CMOS technologies.In dieser Arbeit werden Konzepte und Schaltungen zur lokalen Takterzeugung in heterogenen Multiprozessorsystemen (MPSoCs) mit geringer Verlustleistung erforscht und entwickelt. Diese Systeme besitzen eine global-asynchrone lokal-synchrone Architektur sowie Funktionalität zum Power Management, wie z.B. das feingranulare, schnelle Skalieren von Spannung und Taktfrequenz (DVFS). Um diese Funktionalität zu realisieren werden kompakte Taktgeneratoren benötigt, welche eine kleine Chipfläche einnehmen, wenig Verlustleitung aufnehmen, einen weiten Bereich an Ausgangsfrequenzen erzeugen und diese sehr schnell ändern können.
Sie sollen individuell pro Prozessorkern integriert werden. Dazu werden kompakte volldigitale Phasenregelkreise (ADPLLs) entwickelt, wobei eine bang-bang ADPLL Architektur numerisch modelliert und für kleine Jitterakkumulation optimiert wird. Es wird eine 65nm CMOS ADPLL implementiert, welche eine neuartige Kompensationsschlatung für den digital gesteuerten Oszillator (DCO) zur Verringerung der Sensitivität bezüglich Versorgungsspannung und Temperatur beinhaltet. Zusätzlich wird eine 28nm CMOS ADPLL mit einer neuen Technik zum schnellen Einschwingen unter Nutzung eines Phasensynchronisierers realisiert. Der Prozessortakt wird durch ein neuartiges Phasenmultiplex- und Frequenzteilerverfahren erzeugt, welches es ermöglicht die Taktfrequenz sofort zu ändern um schnelles DVFS zu realisieren.
Die Sensitivität dieses Frequenzgenerators bezüglich Phasen-Mismatch wird theoretisch analysiert und durch Verwendung von kreuzgekoppelten Taktverstärkern kompensiert. Die hier entwickelten Taktgeneratoren haben eine kleine Chipfläche (0.0097mm2 (65nm), 0.00234mm2 (28nm)) und Leistungsaufnahme (2.7mW (65nm), 0.64mW (28nm)). Sie stellen Frequenzen von 83MHz bis 666MHz bereit, welche sofort geändert werden können. Die Schaltungen erfüllen die Jitterspezifikationen von DDR2/DDR3 Speicherinterfaces. Zusätzliche können schnelle Takte für neuartige serielle on-Chip
Verbindungen erzeugt werden. Die ADPLL Schaltungen wurden erfolgreich in 3 Testchips erprobt. Sie ermöglichen die effiziente Realisierung von zukünftigen MPSoCs mit Power Management in modernsten CMOS Technologien
DESIGN AND CHARACTERIZATION OF LOW-POWER LOW-NOISE ALLDIGITAL SERIAL LINK FOR POINT-TO-POINT COMMUNICATION IN SOC
The fully-digital implementation of serial links has recently emerged as a viable
alternative to their classical analogue counterpart. Indeed, reducing the analogue
content in favour of expanding the digital content becomes more attractive due to the
ability to achieve less power consumption, less sensitivity to the noise and better
scalability across multiple technologies and platforms with inconsiderable
modifications. In addition, describing the circuit in hardware description languages
gives it a high flexibility to program all design parameters in a very short time
compared with the analogue designs which need to be re-designed at transistor level
for any parameter change. This can radically reduce cost and time-to-market by
saving a significant amount of development time. However, beside these considerable
advantages, the fully-digital architecture poses several design challenges
Fast Fourier transforms on energy-efficient application-specific processors
Many of the current applications used in battery powered devices are from digital signal processing, telecommunication, and multimedia domains. Traditionally application-specific fixed-function circuits have been used in these designs in form of application-specific integrated circuits (ASIC) to reach the required performance and energy-efficiency. The complexity of these applications has increased over the years, thus the design complexity has increased even faster, which implies increased design time. At the same time, there are more and more standards to be supported, thus using optimised fixed-function implementations for all the functions in all the standards is impractical. The non-recurring engineering costs for integrated circuits have also increased significantly, so manufacturers can only afford fewer chip iterations. Although tailoring the circuit for a specific application provides the best performance and/or energy-efficiency, such approach lacks flexibility. E.g., if an error is found after the manufacturing, an expensive chip iteration is required. In addition, new functionalities cannot be added afterwards to support evolution of standards.
Flexibility can be obtained with software based implementation technologies. Unfortunately, general-purpose processors do not provide the energy-efficiency of the fixed-function circuit designs. A useful trade-off between flexibility and performance is implementation based on application-specific processors (ASP) where programmability provides the flexibility and computational resources customised for the given application provide the performance.
In this Thesis, application-specific processors are considered by using fast Fourier transform as the representative algorithm. The architectural template used here is transport triggered architecture (TTA) which resembles very long instruction word machines but the operand execution resembles data flow machines rather than traditional operand triggering. The developed TTA processors exploit inherent parallelism of the application. In addition, several characteristics of the application have been identified and those are exploited by developing customised functional units for speeding up the execution. Several customisations are proposed for the data path of the processor but it is also important to match the memory bandwidth to the computation speed. This calls for a memory organisation supporting parallel memory accesses. The proposed optimisations have been used to improve the energy-efficiency of the processor and experiments show that a programmable solution can have energy-efficiency comparable to fixed-function ASIC designs