35 research outputs found
Sincronização em sistemas integrados a alta velocidade
Doutoramento em Engenharia ElectrotécnicaA distribui ção de um sinal relógio, com elevada precisão espacial (baixo
skew) e temporal (baixo jitter ), em sistemas sí ncronos de alta velocidade tem-se revelado uma tarefa cada vez mais demorada e complexa devido ao escalonamento da tecnologia. Com a diminuição das dimensões dos dispositivos
e a integração crescente de mais funcionalidades nos Circuitos Integrados (CIs), a precisão associada as transições do sinal de relógio tem sido cada vez mais afectada por varia ções de processo, tensão e temperatura.
Esta tese aborda o problema da incerteza de rel ogio em CIs de alta velocidade, com o objetivo de determinar os limites do paradigma de desenho sí ncrono.
Na prossecu ção deste objectivo principal, esta tese propõe quatro novos modelos de incerteza com âmbitos de aplicação diferentes. O primeiro modelo permite estimar a incerteza introduzida por um inversor est atico CMOS, com base em parâmetros simples e su cientemente gen éricos para que possa ser usado na previsão das limitações temporais de circuitos mais complexos, mesmo na fase inicial do projeto. O segundo modelo, permite
estimar a incerteza em repetidores com liga ções RC e assim otimizar o dimensionamento da rede de distribui ção de relógio, com baixo esfor ço computacional. O terceiro modelo permite estimar a acumula ção de incerteza em cascatas de repetidores. Uma vez que este modelo tem em considera ção a correla ção entre fontes de ruí do, e especialmente util para promover t ecnicas de distribui ção de rel ogio e de alimentação que possam minimizar a acumulação de incerteza. O quarto modelo permite estimar a incerteza temporal em sistemas com m ultiplos dom ínios de sincronismo.
Este modelo pode ser facilmente incorporado numa ferramenta autom atica
para determinar a melhor topologia para uma determinada aplicação ou para avaliar a tolerância do sistema ao ru ído de alimentação.
Finalmente, usando os modelos propostos, são discutidas as tendências da precisão de rel ogio. Conclui-se que os limites da precisão do rel ogio são, em ultima an alise, impostos por fontes de varia ção dinâmica que se preveem crescentes na actual l ogica de escalonamento dos dispositivos. Assim sendo,
esta tese defende a procura de solu ções em outros ní veis de abstração, que não apenas o ní vel f sico, que possam contribuir para o aumento de desempenho dos CIs e que tenham um menor impacto nos pressupostos do paradigma de desenho sí ncrono.Distributing a the clock simultaneously everywhere (low skew) and periodically
everywhere (low jitter) in high-performance Integrated Circuits (ICs)
has become an increasingly di cult and time-consuming task, due to technology
scaling. As transistor dimensions shrink and more functionality is
packed into an IC, clock precision becomes increasingly a ected by Process,
Voltage and Temperature (PVT) variations. This thesis addresses the
problem of clock uncertainty in high-performance ICs, in order to determine
the limits of the synchronous design paradigm.
In pursuit of this main goal, this thesis proposes four new uncertainty models,
with di erent underlying principles and scopes. The rst model targets
uncertainty in static CMOS inverters. The main advantage of this model
is that it depends only on parameters that can easily be obtained. Thus,
it can provide information on upcoming constraints very early in the design
stage. The second model addresses uncertainty in repeaters with RC interconnects,
allowing the designer to optimise the repeater's size and spacing,
for a given uncertainty budget, with low computational e ort. The third
model, can be used to predict jitter accumulation in cascaded repeaters, like
clock trees or delay lines. Because it takes into consideration correlations
among variability sources, it can also be useful to promote
oorplan-based
power and clock distribution design in order to minimise jitter accumulation.
A fourth model is proposed to analyse uncertainty in systems with multiple
synchronous domains. It can be easily incorporated in an automatic tool
to determine the best topology for a given application or to evaluate the
system's tolerance to power-supply noise.
Finally, using the proposed models, this thesis discusses clock precision
trends. Results show that limits in clock precision are ultimately imposed
by dynamic uncertainty, which is expected to continue increasing with technology
scaling. Therefore, it advocates the search for solutions at other
abstraction levels, and not only at the physical level, that may increase
system performance with a smaller impact on the assumptions behind the
synchronous design paradigm
High-Speed Clocking Deskewing Architecture
As the CMOS technology continues to scale into the deep sub-micron regime, the demand
for higher frequencies and higher levels of integration poses a significant challenge for the clock generation and distribution design of microprocessors. Hence, skew optimization schemes are necessary to limit clock inaccuracies to a small fraction of the clock period. In this thesis, a crude deskew buffer (CDB) is designed to facilitate an adaptive deskewing scheme that reduces the clock skew in an ASIC clock network under manufacturing process,
supply voltage, and temperature (PVT)variations. The crude deskew buffer adopts a DLL structure and functions on a 1GHz nominal clock frequency with an operating frequency range of 800MHz to 1.2GHz. An approximate 91.6ps phase resolution is achieved for all simulation conditions including various process corners and temperature variation. When the crude deskew buffer is applied to seven ASIC clock networks with each under various
PVT variations, a maximum of 67.1% reduction in absolute maximum clock skew has been achieved. Furthermore, the maximum phase difference between all the clock signals in the seven networks have been reduced from 957.1ps to 311.9ps, a reduction of 67.4%. Overall, the CDB serves two important purposes in the proposed deskewing methodology: reducing the absolute maximum clock skew and synchronizes all the clock signals to a certain limit for the fine deskewing scheme. By generating various clock phases, the CDB can also be potentially useful in high speed debugging and testing where the clock duty cycle can be adjusted accordingly. Various positive and negative duty cycle values can be generated based on the phase resolution and the number of clock phases being “hot swapped”. For a
500ps duty cycle, the following values can be achieved for both the positive and negative duty cycle: 224ps, 316ps, 408ps, 592ps, 684ps, and 776ps
Precise Timing of Digital Signals: Circuits and Applications
With the rapid advances in process technologies, the performance of state-of-the-art integrated circuits is improving steadily. The drive for higher performance is accompanied with increased emphasis on meeting timing constraints not only at the design phase but during device operation as well. Fortunately, technology advancements allow for even more precise control of the timing of digital signals, an advantage which can be used to provide solutions that can address some of the emerging timing issues. In this thesis, circuit and architectural techniques for the precise timing of digital signals are explored. These techniques are demonstrated in applications addressing timing issues in modern digital systems.
A methodology for slow-speed timing characterization of high-speed pipelined datapaths is proposed. The technique uses a clock-timing circuit to create shifted versions of a slow-speed clock. These clocks control the data flow in the pipeline in the test mode. Test results show that the design provides an average timing resolution of 52.9ps in 0.18μm CMOS technology. Results also demonstrate the ability of the technique to track the performance of high-speed pipelines at a reduced clock frequency and to test the clock-timing circuit itself.
In order to achieve higher resolutions than that of an inverter/buffer stage, a differential (vernier) delay line is commonly used. To allow for the design of differential delay lines with programmable delays, a digitally-controlled delay-element is proposed. The delay element is monotonic and achieves a high degree of transfer characteristics' (digital code vs. delay) linearity. Using the proposed delay element, a sub-1ps resolution is demonstrated experimentally in 0.18μm CMOS.
The proposed delay element with a fixed delay step of 2ps is used to design a high-precision all-digital phase aligner. High-precision phase alignment has many applications in modern digital systems such as high-speed memory controllers, clock-deskew buffers, and delay and phase-locked loops. The design is based on a differential delay line and a variation tolerant phase detector using redundancy. Experimental results show that the phase aligner's range is from -264ps to +247ps which corresponds to an average delay step of approximately 2.43ps. For various input phase difference values, test results show that the difference is reduced to less than 2ps at the output of the phase aligner.
On-chip time measurement is another application that requires precise timing. It has applications in modern automatic test equipment and on-chip characterization of jitter and skew. In order to achieve small conversion time, a flash time-to-digital converter is proposed. Mismatch between the various delay comparators limits the time measurement precision. This is demonstrated through an experiment in which a 6-bit, 2.5ps resolution flash time-to-digital converter provides an effective resolution of only 4-bits. The converter achieves a maximum conversion rate of 1.25GSa/s
Recommended from our members
Cross-Layer Pathfinding for Off-Chip Interconnects
Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects
Deliverable D4.1: VLC modulation schemes
This report presents the
analysis of different modulation schemes D4.1 for VLC systems
of
the VIDAS project.
Considering the final
prototype design and
application, the
deliverable D4.1 was projected.
The
detail analysis of various modulation schemes are carried out and a robust technique
based on direct sequence spread spectrum (DSSS) is followed. DSSS technique though
necessitates use of high bandwidth while minimizing the effect of noise. Since the final
application does not require very high dat
a rate of transmission but robustness against the
noise (external lights)
becomes necessary. The analysis is followed by model development
using Matlab/Simulink.
The performance of both of these systems are compared and
evaluated.
Some of the simulation
results are presented
통계적 주파수 검출기 기반 기준 주파수를 사용하지 않는 클록 및 데이터 복원 회로의 설계 방법론
학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022. 8. 정덕균.In this thesis, a design of a high-speed, power-efficient, wide-range clock and data recovery (CDR) without a reference clock is proposed. A frequency acquisition scheme using a stochastic frequency detector (SFD) based on the Alexander phase detector (PD) is utilized for the referenceless operation. Pat-tern histogram analysis is presented to analyze the frequency acquisition behavior of the SFD and verified by simulation. Based on the information obtained by pattern histogram analysis, SFD using autocovariance is proposed. With a direct-proportional path and a digital integral path, the proposed referenceless CDR achieves frequency lock at all measurable conditions, and the measured frequency acquisition time is within 7μs. The prototype chip has been fabricated in a 40-nm CMOS process and occupies an active area of 0.032 mm2. The proposed referenceless CDR achieves the BER of less than 10-12 at 32 Gb/s and exhibits an energy efficiency of 1.15 pJ/b at 32 Gb/s with a 1.0 V supply.본 논문은 기준 클럭이 없는 고속, 저전력, 광대역으로 동작하는 클럭 및 데이터 복원회로의 설계를 제안한다. 기준 클럭이 없는 동작을 위해서 알렉산더 위상 검출기에 기반한 통계적 주파수 검출기를 사용하는 주파수 획득 방식이 사용된다. 통계적 주파수 검출기의 주파수 추적 양상을 분석하기 위해 패턴 히스토그램 분석 방법론을 제시하였고 시뮬레이션을 통해 검증하였다. 패턴 히스토그램 분석을 통해 얻은 정보를 바탕으로 자기공분산을 이용한 통계적 주파수 검출기를 제안한다. 직접 비례 경로와 디지털 적분 경로를 통해 제안된 기준 클럭이 없는 클럭 및 데이터 복원회로는 모든 측정 가능한 조건에서 주파수 잠금을 달성하는 데 성공하였고, 모든 경우에서 측정된 주파수 추적 시간은 7μs 이내이다. 40-nm CMOS 공정을 이용하여 만들어진 칩은 0.032 mm2의 면적을 차지한다. 제안하는 클럭 및 데이터 복원회로는 32 Gb/s의 속도에서 비트에러율 10-12 이하로 동작하였고, 에너지 효율은 32Gb/s의 속도에서 1.0V 공급전압을 사용하여 1.15 pJ/b을 달성하였다.CHAPTER 1 INTRODUCTION 1
1.1 MOTIVATION 1
1.2 THESIS ORGANIZATION 13
CHAPTER 2 BACKGROUNDS 14
2.1 CLOCKING ARCHITECTURES IN SERIAL LINK INTERFACE 14
2.2 GENERAL CONSIDERATIONS FOR CLOCK AND DATA RECOVERY 24
2.2.1 OVERVIEW 24
2.2.2 JITTER 26
2.2.3 CDR JITTER CHARACTERISTICS 33
2.3 CDR ARCHITECTURES 39
2.3.1 PLL-BASED CDR – WITH EXTERNAL REFERENCE CLOCK 39
2.3.2 DLL/PI-BASED CDR 44
2.3.3 PLL-BASED CDR – WITHOUT EXTERNAL REFERENCE CLOCK 47
2.4 FREQUENCY ACQUISITION SCHEME 50
2.4.1 TYPICAL FREQUENCY DETECTORS 50
2.4.1.1 DIGITAL QUADRICORRELATOR FREQUENCY DETECTOR 50
2.4.1.2 ROTATIONAL FREQUENCY DETECTOR 54
2.4.2 PRIOR WORKS 56
CHAPTER 3 DESIGN OF THE REFERENCELESS CDR USING SFD 58
3.1 OVERVIEW 58
3.2 PROPOSED FREQUENCY DETECTOR 62
3.2.1 MOTIVATION 62
3.2.2 PATTERN HISTOGRAM ANALYSIS 68
3.2.3 INTRODUCTION OF AUTOCOVARIANCE TO STOCHASTIC FREQUENCY DETECTOR 75
3.3 CIRCUIT IMPLEMENTATION 83
3.3.1 IMPLEMENTATION OF THE PROPOSED REFERENCELESS CDR 83
3.3.2 CONTINUOUS-TIME LINEAR EQUALIZER (CTLE) 85
3.3.3 DIGITALLY-CONTROLLED OSCILLATOR (DCO) 87
3.4 MEASUREMENT RESULTS 89
CHAPTER 4 CONCLUSION 99
APPENDIX A DETAILED FREQUENCY ACQUISITION WAVEFORMS OF THE PROPOSED SFD 100
BIBLIOGRAPHY 108
초 록 122박
High-performance and Low-power Clock Network Synthesis in the Presence of Variation.
Semiconductor technology scaling requires continuous evolution of all aspects of physical
design of integrated circuits. Among the major design steps, clock-network synthesis
has been greatly affected by technology scaling, rendering existing methodologies inadequate.
Clock routing was previously sufficient for smaller ICs, but design difficulty and
structural complexity have greatly increased as interconnect delay and clock frequency increased
in the 1990s. Since a clock network directly influences IC performance and often
consumes a substantial portion of total power, both academia and industry developed synthesis
methodologies to achieve low skew, low power and robustness from PVT variations.
Nevertheless, clock network synthesis under tight constraints is currently the least automated
step in physical design and requires significant manual intervention, undermining
turn-around-time. The need for multi-objective optimization over a large parameter space
and the increasing impact of process variation make clock network synthesis particularly
challenging.
Our work identifies new objectives, constraints and concerns in the clock-network synthesis
for systems-on-chips and microprocessors. To address them, we generate novel
clock-network structures and propose changes in traditional physical-design flows. We
develop new modeling techniques and algorithms for clock power optimization subject
to tight skew constraints in the presence of process variations. In particular, we offer
SPICE-accurate optimizations of clock networks, coordinated to reduce nominal skew below
5 ps, satisfy slew constraints and trade-off skew, insertion delay and power, while
tolerating variations. To broaden the scope of clock-network-synthesis optimizations, we
propose new techniques and a methodology to reduce dynamic power consumption by
6.8%-11.6% for large IC designs with macro blocks by integrating clock network synthesis
within global placement. We also present a novel non-tree topology that is 2.3x more
power-efficient than mesh structures. We fuse several clock trees to create large-scale redundancy
in a clock network to bridge the gap between tree-like and mesh-like topologies.
Integrated optimization techniques for high-quality clock networks described in this dissertation
strong empirical results in experiments with recent industry-released benchmarks
in the presence of process variation. Our software implementations were recognized with
the first-place awards at the ISPD 2009 and ISPD 2010 Clock-Network Synthesis Contests
organized by IBM Research and Intel Research.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89711/1/ejdjsy_1.pd
Design and Implementation of the New D0 Level-1 Calorimeter Trigger
Increasing luminosity at the Fermilab Tevatron collider has led the D0
collaboration to make improvements to its detector beyond those already in
place for Run IIa, which began in March 2001. One of the cornerstones of this
Run IIb upgrade is a completely redesigned level-1 calorimeter trigger system.
The new system employs novel architecture and algorithms to retain high
efficiency for interesting events while substantially increasing rejection of
background. We describe the design and implementation of the new level-1
calorimeter trigger hardware and discuss its performance during Run IIb data
taking. In addition to strengthening the physics capabilities of D0, this
trigger system will provide valuable insight into the operation of analogous
devices to be used at LHC experiments.Comment: 43 pages, 20 figures, version published in Nucl. Instrum. and Methods