44 research outputs found
HIGH PERFORMANCE CLOCK DISTRIBUTION FOR HIGH-SPEED VLSI SYSTEMS
Tohoku Universityć ćŁ éČèȘČ
Circuit delay optimization by buffering the logic gates
Avec la miniaturisation actuelle, les circuits dĂ©montrent de plus en plus l'importance des dĂ©lais d'interconnexion. Afin de rĂ©duire ce dĂ©lai, l'insertion de tampons doit ĂȘtre effectuĂ©e durant la synthĂšse logique et la synthĂšse physique. Cette activitĂ© d'optimisation est souvent basĂ©e sur la programmation dynamique. Dans ce mĂ©moire, la technique branch-and-bound est utilisĂ© et le problĂšme pour le cas spĂ©cifique d'arbres de tampons Ă©quilibrĂ©s est rĂ©solu, oĂč toutes les charges ont un temps requis et une capacitĂ© identique. Une analyse mathĂ©matique est faite pour tenir compte d'une variĂ©tĂ© de questions de conception telles que la topologie, la bibliothĂšque de tampons et le changement de phase en prĂ©sence d'inverseur. En combinant la programmation dynamique et les techniques branch-and-bound, une mĂ©thode hybride est prĂ©sentĂ©e qui amĂ©liore le temps d'exĂ©cution tout en conservant une utilisation de mĂ©moire raisonnable. Les concepts mathĂ©matiques et algorithmiques fondamentaux utilisĂ©s dans ce mĂ©moire peuvent ĂȘtre employĂ©s pour gĂ©nĂ©raliser la mĂ©thode proposĂ©e pour un ensemble de charges avec des capacitĂ©s et des temps requis diffĂ©rents
Clock Tree and Flip-flop Co-optimization for Reducing Power Consumption and Power/Ground Noise of Integrated Circuits and Systems
íìë
ŒëŹž (ë°ìŹ)-- ììžëíê” ëíì êł”êłŒëí ì Ʞ·컎íší°êł”íë¶, 2017. 8. êčíí.For very-large-scale integration (VLSI) circuits, the activation of all flip-flops that are used to store data is synchronized by clock signals delivered through clock networks. Due to very high frequency of clock signal switches, the dynamic power consumed on clock networks takes a considerable portion of the total power consumption of the circuits. In addition, the largest amount of power consumption in the clock networks comes from the flip-flops and the buffers that drive the flip-flops at the clock network boundary. In addition, the requirement
of simultaneously activating all flip-flops for synchronous circuits induces a high peak power/ground noise (i.e., voltage drop) at the clock boundary.
In this regards, this thesis addresses two new problems: the problem of reducing the clock power consumption at the clock network boundary, and the problem of reducing the peak current at the clock network boundary. Unlike the prior works which have considered the optimization of flip-flops and clock buffers separately, our approach takes into account the co-optimization of flip-flops and clock buffers. Precisely, we propose four different types of hardware component that can implement a set of flip-flops and their driving buffer as a single unit.
The key idea for the derivation of the four types of clock boundary component is that one of the inverters in the driving buffer and one of the inverters in each flip-flop can be combined and removed without changing the functionality of the flip-flops. Consequently, we have a more freedom to select (i.e., allocate) clock boundary components that is able to reduce the power consumption or peak current under timing constraint. We have implemented our approach of clock boundary optimization under bounded clock skew constraint and tested it with ISCAS 89 benchmark circuits. The experimental results confirm that our approach is able to reduce the clock power consumption by 7.9âŒ10.2% and power/ground noise by 27.7%âŒ30.9% on average.Chapter 1 Introduction 1
1.1 Clock Signal 1
1.2 Metrics of Clock Design 2
1.3 Clock Network Topologies 4
1.4 Multibit Flip-flop 5
1.5 Simultaneous Switching Noise 6
1.6 Contributions of This Dissertation 6
Chapter 2 Clock Tree and Flip-flop Co-optimization for Reducing Power Consumption 8
2.1 Introduction 8
2.2 Types of Boundary Optimization 9
2.3 Analysis of Four Types of Flip-flop 12
2.3.1 Internal Power Comparison 12
2.3.2 Characterization of Power Consumption 14
2.4 Problem Formulation 15
2.5 The Proposed Algorithm 17
2.5.1 Independence Assumption 17
2.5.2 BoundaryMin Algorithm 17
2.6 Experimental Results 29
2.6.1 Experimental Setup 29
2.6.2 Clock Tree Boundary Optimization Results 33
2.6.3 Capacitance Analysis on Flip-flops 38
2.6.4 Slew and Skew Analysis 39
2.6.5 Window Width Analysis 39
2.7 Conclusions 41
Chapter 3 Clock Tree and Flip-flop Co-optimization for Reducing Power/Ground Noise 42
3.1 Introduction 42
3.2 Current Characteristic of Four Types of Flip-flop 45
3.3 Motivational Example 47
3.4 Problem Formulation 52
3.5 Proposed Algorithm 54
3.5.1 An Overview 54
3.5.2 Superposition of Current Flows 55
3.5.3 Formulation to Instance of MOSP Problem 57
3.5.4 Selecting Target Power Grid Points 59
3.5.5 Consideration of Reducing Power Consumption 62
3.6 Experimental Results 62
3.7 Summary 65
Chapter 4 Conclusion 68
4.1 Clock Buffer and Flip-flop Co-optimization for Reducing Power Consumption 68
4.2 Clock Buffer and Flip-flop Co-optimization for Reducing Power/Ground Noise 69
ìŽëĄ 78Docto
Recommended from our members
Cross-Layer Pathfinding for Off-Chip Interconnects
Off-chip interconnects for integrated circuits (ICs) today induce a diverse design space, spanning many different applications that require transmission of data at various bandwidths, latencies and link lengths. Off-chip interconnect design solutions are also variously sensitive to system performance, power and cost metrics, while also having a strong impact on these metrics. The costs associated with off-chip interconnects include die area, package (PKG) and printed circuit board (PCB) area, technology and bill of materials (BOM). Choices made regarding off-chip interconnects are fundamental to product definition, architecture, design implementation and technology enablement. Given their cross-layer impact, it is imperative that a cross-layer approach be employed to architect and analyze off-chip interconnects up front, so that a top-down design flow can comprehend the cross-layer impacts and correctly assess the system performance, power and cost tradeoffs for off-chip interconnects. Chip architects are not exposed to all the tradeoffs at the physical and circuit implementation or technology layers, and often lack the tools to accurately assess off-chip interconnects. Furthermore, the collaterals needed for a detailed analysis are often lacking when the chip is architected; these include circuit design and layout, PKG and PCB layout, and physical floorplan and implementation. To address the need for a framework that enables architects to assess the system-level impact of off-chip interconnects, this thesis presents power-area-timing (PAT) models for off-chip interconnects, optimization and planning tools with the appropriate abstraction using these PAT models, and die/PKG/PCB co-design methods that help expose the off-chip interconnect cross-layer metrics to the die/PKG/PCB design flows. Together, these models, tools and methods enable cross-layer optimization that allows for a top-down definition and exploration of the design space and helps converge on the correct off-chip interconnect implementation and technology choice. The tools presented cover off-chip memory interfaces for mobile and server products, silicon photonic interfaces, 2.5D silicon interposers and 3D through-silicon vias (TSVs). The goal of the cross-layer framework is to assess the key metrics of the interconnect (such as timing, latency, active/idle/sleep power, and area/cost) at an appropriate level of abstraction by being able to do this across layers of the design flow. In additional to signal interconnect, this thesis also explores the need for such cross-layer pathfinding for power distribution networks (PDN), where the system-on-chip (SoC) floorplan and pinmap must be optimized before the collateral layouts for PDN analysis are ready. Altogether, the developed cross-layer pathfinding methodology for off-chip interconnects enables more rapid and thorough exploration of a vast design space of off-chip parallel and serial links, inter-die and inter-chiplet links and silicon photonics. Such exploration will pave the way for off-chip interconnect technology enablement that is optimized for system needs. The basis of the framework can be extended to cover other interconnect technology as well, since it fundamentally relates to system-level metrics that are common to all off-chip interconnects
Physical Design Methodologies for Low Power and Reliable 3D ICs
As the semiconductor industry struggles to maintain its momentum down the path following the Moore's Law, three dimensional integrated circuit (3D IC) technology has emerged as a promising solution to achieve higher integration density, better performance, and lower power consumption. However, despite its significant improvement in electrical performance, 3D IC presents several serious physical design challenges. In this dissertation, we investigate physical design methodologies for 3D ICs with primary focus on two areas: low power 3D clock tree design, and reliability degradation modeling and management.
Clock trees are essential parts for digital system which dissipate a large amount of power due to high capacitive loads. The majority of existing 3D clock tree designs focus on minimizing the total wire length, which produces sub-optimal results for power optimization. In this dissertation, we formulate a 3D clock tree design flow which directly optimizes for clock power. Besides, we also investigate the design methodology for clock gating a 3D clock tree, which uses shutdown gates to selectively turn off unnecessary clock activities. Different from the common assumption in 2D ICs that shutdown gates are cheap thus can be applied at every clock node, shutdown gates in 3D ICs introduce additional control TSVs, which compete with clock TSVs for placement resources. We explore the design methodologies to produce the optimal allocation and placement for clock and control TSVs so that the clock power is minimized. We show that the proposed synthesis flow saves significant clock power while accounting for available TSV placement area.
Vertical integration also brings new reliability challenges including TSV's electromigration (EM) and several other reliability loss mechanisms caused by TSV-induced stress. These reliability loss models involve complex inter-dependencies between electrical and thermal conditions, which have not been investigated in the past. In this dissertation we set up an electrical/thermal/reliability co-simulation framework to capture the transient of reliability loss in 3D ICs. We further derive and validate an analytical reliability objective function that can be integrated into the 3D placement design flow. The reliability aware placement scheme enables co-design and co-optimization of both the electrical and reliability property, thus improves both the circuit's performance and its lifetime. Our electrical/reliability co-design scheme avoids unnecessary design cycles or application of ad-hoc fixes that lead to sub-optimal performance.
Vertical integration also enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. In addition, a dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU's operating points to adjust DRAM's voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM's resilience without sacrificing performance.
The proposed physical design methodologies should act as important building blocks for 3D ICs and push 3D ICs toward mainstream acceptance in the near future
Case Studies on Clock Gating and Local Routign for VLSI Clock Mesh
The clock is the important synchronizing element in all synchronous digital systems. The difference in the clock arrival time between sink points is called the clock skew. This uncertainty in arrival times will limit operating frequency and might cause functional errors.
Various clock routing techniques can be broadly categorized into 'balanced tree' and 'fixed mesh' methods. The skew and delay using the balanced tree method is higher compared to the fixed mesh method. Although fixed mesh inherently uses more wire length, the redundancy created by loops in a mesh structure reduces undesired delay variations. The fixed mesh method uses a single mesh over the entire chip but it is hard to introduce clock gating in a single clock mesh. This thesis deals with the introduction of 'reconfigurability' by using control structures like transmission gates between sub-clock meshes, thus enabling clock gating in clock mesh. By using the optimum value of size for PMOS and NMOS of transmission gate (SZF) and optimum number of transmission gates between sub-clock meshes (NTG) for 4x4 reconfigurable mesh, the average of the maximum skew for all benchmarks is reduced by 18.12 percent compared to clock mesh structure when no transmission gates are used between the sub-clock meshes (reconfigurable mesh with NTG =0).
Further, the research deals with a âmodified zero skew method' to connect synchronous flip-flops or sink points in the circuit to the clock grids of clock mesh. The wire length reduction algorithms can be applied to reduce the wire length used for a local clock distribution network. The modified version of âzero skew methodâ of local clock routing which is based on Elmore delay balancing aims at minimizing wire length for the given bounded skew of CDN using clock mesh and H-tree. The results of âmodified zero skew method' (HC_MZSK) show average local wire length reduction of 17.75 percent for all ISPD benchmarks compared to direct connection method. The maximum skew is small for HC_MZSK in most of the test cases compared to other methods of connections like direct connections and modified AHHK. Thus, HC_MZSK for local routing reduces the wire length and maximum skew
Injection locked ring oscillator design for application in Direct Time of Flight LIDAR
DiplomovĂĄ prĂĄce pĆibliĆŸuje systĂ©my LIDAR pĆĂmo mÄĆĂcĂ Äas prĆŻletu a ÄasovÄ digitĂĄlnĂ pĆevodnĂky urÄenĂ© k pouĆŸitĂ v tÄchto systĂ©mech. PĆedstavuje problematiku distribuce hodinovĂœch signĂĄlĆŻ napĆĂÄ soubory ÄasovÄ digitĂĄlnĂch pĆevodnĂkĆŻ v LIDAR systĂ©mech a vÄnuje se jednomu z novĂœch ĆeĆĄenĂ tĂ©to problematiky, kterĂ© je zaloĆŸenĂ© na injekcĂ zavÄĆĄenĂœch oscilĂĄtorech. Technika injekÄnĂho zavÄĆĄenĂ oscilĂĄtorĆŻ je dĆŻkladnÄ matematicky popsĂĄna. V programu Matlab byl vytvoĆen simulaÄnĂ model injekcĂ zavÄĆĄenĂ©ho kruhovĂ©ho oscilĂĄtoru, kterĂœ potvrzuje sprĂĄvnost uvedenĂœch analytickĂœch predikcĂ. Ve vĂœrobnĂ technologii ONK65 byl navrĆŸen injekcĂ zavÄĆĄenĂœ kruhovĂœ oscilĂĄtor stabilizovanĂœ pomocĂ smyÄky zĂĄvÄsu zpoĆŸdÄnĂ, urÄenĂœ pro implementaci ÄasovÄ digitĂĄlnĂho pĆevodnĂku pro systĂ©m LIDAR. NavrĆŸenĂœ injekcĂ zavÄĆĄenĂœ kruhovĂœ oscilĂĄtor byl verifikovĂĄn poÄĂtaÄovĂœmi simulacemi zohledĆujĂcĂmi vliv procesnĂch, napÄĆ„ovĂœch i teplotnĂch variacĂ. OscilĂĄtor poskytuje specifikovanĂ© ÄasovĂ© rozliĆĄenĂ 50 pikosekund a dosahuje dvakrĂĄt niĆŸĆĄĂ hodnoty fĂĄzovĂ©ho neklidu neĆŸ ekvivalentnĂ volnobÄĆŸnĂœ oscilĂĄtor v danĂ© technologii.The diploma thesis provides an introduction to Direct Time of Flight LIDAR systems and Time to Digital Converters used in these systems. It discusses the problem of clock distribution in LIDAR Time to Digital Converter arrays, and examines one of the possible solutions to this problem based on injection locked oscillators. The injection locking phenomenon is thoroughly mathematically described and a Matlab model of an injection locked ring oscillator is presented, confirming the analytic predictions. In ONK65 processing technology, an injection locked ring oscillator biased by a delay locked loop meant specifically for application in Time to Digital Converters for LIDAR systems has been designed. The designed oscillator has been verified by computer simulations taking process, voltage and temperature variations into account and offers specified time resolution of 50 picosecond as well as two times less clock jitter than an equivalent free-running oscillator in the given processing technology.
Sincronização em sistemas integrados a alta velocidade
Doutoramento em Engenharia ElectrotécnicaA distribui ção de um sinal relógio, com elevada precisão espacial (baixo
skew) e temporal (baixo jitter ), em sistemas sà ncronos de alta velocidade tem-se revelado uma tarefa cada vez mais demorada e complexa devido ao escalonamento da tecnologia. Com a diminuição das dimensÔes dos dispositivos
e a integração crescente de mais funcionalidades nos Circuitos Integrados (CIs), a precisão associada as transiçÔes do sinal de relógio tem sido cada vez mais afectada por varia çÔes de processo, tensão e temperatura.
Esta tese aborda o problema da incerteza de rel ogio em CIs de alta velocidade, com o objetivo de determinar os limites do paradigma de desenho sĂ ncrono.
Na prossecu ção deste objectivo principal, esta tese propÔe quatro novos modelos de incerteza com ùmbitos de aplicação diferentes. O primeiro modelo permite estimar a incerteza introduzida por um inversor est atico CMOS, com base em parùmetros simples e su cientemente gen éricos para que possa ser usado na previsão das limitaçÔes temporais de circuitos mais complexos, mesmo na fase inicial do projeto. O segundo modelo, permite
estimar a incerteza em repetidores com liga çÔes RC e assim otimizar o dimensionamento da rede de distribui ção de relĂłgio, com baixo esfor ço computacional. O terceiro modelo permite estimar a acumula ção de incerteza em cascatas de repetidores. Uma vez que este modelo tem em considera ção a correla ção entre fontes de ruĂ do, e especialmente util para promover t ecnicas de distribui ção de rel ogio e de alimentação que possam minimizar a acumulação de incerteza. O quarto modelo permite estimar a incerteza temporal em sistemas com m ultiplos dom Ănios de sincronismo.
Este modelo pode ser facilmente incorporado numa ferramenta autom atica
para determinar a melhor topologia para uma determinada aplicação ou para avaliar a tolerĂąncia do sistema ao ru Ădo de alimentação.
Finalmente, usando os modelos propostos, sĂŁo discutidas as tendĂȘncias da precisĂŁo de rel ogio. Conclui-se que os limites da precisĂŁo do rel ogio sĂŁo, em ultima an alise, impostos por fontes de varia ção dinĂąmica que se preveem crescentes na actual l ogica de escalonamento dos dispositivos. Assim sendo,
esta tese defende a procura de solu çÔes em outros nà veis de abstração, que não apenas o nà vel f sico, que possam contribuir para o aumento de desempenho dos CIs e que tenham um menor impacto nos pressupostos do paradigma de desenho sà ncrono.Distributing a the clock simultaneously everywhere (low skew) and periodically
everywhere (low jitter) in high-performance Integrated Circuits (ICs)
has become an increasingly di cult and time-consuming task, due to technology
scaling. As transistor dimensions shrink and more functionality is
packed into an IC, clock precision becomes increasingly a ected by Process,
Voltage and Temperature (PVT) variations. This thesis addresses the
problem of clock uncertainty in high-performance ICs, in order to determine
the limits of the synchronous design paradigm.
In pursuit of this main goal, this thesis proposes four new uncertainty models,
with di erent underlying principles and scopes. The rst model targets
uncertainty in static CMOS inverters. The main advantage of this model
is that it depends only on parameters that can easily be obtained. Thus,
it can provide information on upcoming constraints very early in the design
stage. The second model addresses uncertainty in repeaters with RC interconnects,
allowing the designer to optimise the repeater's size and spacing,
for a given uncertainty budget, with low computational e ort. The third
model, can be used to predict jitter accumulation in cascaded repeaters, like
clock trees or delay lines. Because it takes into consideration correlations
among variability sources, it can also be useful to promote
oorplan-based
power and clock distribution design in order to minimise jitter accumulation.
A fourth model is proposed to analyse uncertainty in systems with multiple
synchronous domains. It can be easily incorporated in an automatic tool
to determine the best topology for a given application or to evaluate the
system's tolerance to power-supply noise.
Finally, using the proposed models, this thesis discusses clock precision
trends. Results show that limits in clock precision are ultimately imposed
by dynamic uncertainty, which is expected to continue increasing with technology
scaling. Therefore, it advocates the search for solutions at other
abstraction levels, and not only at the physical level, that may increase
system performance with a smaller impact on the assumptions behind the
synchronous design paradigm
Design-for-delay-testability techniques for high-speed digital circuits
The importance of delay faults is enhanced by the ever increasing clock rates and decreasing geometry sizes of nowadays' circuits. This thesis focuses on the development of Design-for-Delay-Testability (DfDT) techniques for high-speed circuits and embedded cores. The rising costs of IC testing and in particular the costs of Automatic Test Equipment are major concerns for the semiconductor industry. To reverse the trend of rising testing costs, DfDT is\ud
getting more and more important