100 research outputs found

    Pre-bond testable low-power clock tree design for 3D stacked ICs

    Full text link

    High-performance and Low-power Clock Network Synthesis in the Presence of Variation.

    Full text link
    Semiconductor technology scaling requires continuous evolution of all aspects of physical design of integrated circuits. Among the major design steps, clock-network synthesis has been greatly affected by technology scaling, rendering existing methodologies inadequate. Clock routing was previously sufficient for smaller ICs, but design difficulty and structural complexity have greatly increased as interconnect delay and clock frequency increased in the 1990s. Since a clock network directly influences IC performance and often consumes a substantial portion of total power, both academia and industry developed synthesis methodologies to achieve low skew, low power and robustness from PVT variations. Nevertheless, clock network synthesis under tight constraints is currently the least automated step in physical design and requires significant manual intervention, undermining turn-around-time. The need for multi-objective optimization over a large parameter space and the increasing impact of process variation make clock network synthesis particularly challenging. Our work identifies new objectives, constraints and concerns in the clock-network synthesis for systems-on-chips and microprocessors. To address them, we generate novel clock-network structures and propose changes in traditional physical-design flows. We develop new modeling techniques and algorithms for clock power optimization subject to tight skew constraints in the presence of process variations. In particular, we offer SPICE-accurate optimizations of clock networks, coordinated to reduce nominal skew below 5 ps, satisfy slew constraints and trade-off skew, insertion delay and power, while tolerating variations. To broaden the scope of clock-network-synthesis optimizations, we propose new techniques and a methodology to reduce dynamic power consumption by 6.8%-11.6% for large IC designs with macro blocks by integrating clock network synthesis within global placement. We also present a novel non-tree topology that is 2.3x more power-efficient than mesh structures. We fuse several clock trees to create large-scale redundancy in a clock network to bridge the gap between tree-like and mesh-like topologies. Integrated optimization techniques for high-quality clock networks described in this dissertation strong empirical results in experiments with recent industry-released benchmarks in the presence of process variation. Our software implementations were recognized with the first-place awards at the ISPD 2009 and ISPD 2010 Clock-Network Synthesis Contests organized by IBM Research and Intel Research.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89711/1/ejdjsy_1.pd

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    Design and automation of voltage-scaled clock networks

    Get PDF
    In this dissertation, a vital step of VLSI physical design flow, synthesis of clock distribution networks, is investigated. Clock network synthesis (CNS) involves large and complex optimization problems to achieve high performance and low power demands of current integrated circuits (ICs). Ineffectiveness of existing methodologies to provide high performance at lower voltage nodes is the main driver for this dissertation research. A design and automation flow for voltage-scaled clock networks is proposed to satisfy tight timing constraints at high frequency (for high performance) and low voltage (for low power) operation. One implementation of voltage-scaled clock networks is low (voltage) swing clocking, which is a known technique, yet its applicability remains limited to designs with low performance demands. In this dissertation, novel methodologies are introduced to i) apply low swing clocking to legacy designs as a power saving methodology, ii) develop a complete CNS flow for low swing clocking of high performance ICs. These methodologies include slew-driven approaches that are better suited to future transistor and interconnect technologies. Second implementation of voltage-scaled clock networks is multi-voltage clocking, which is another known technique, yet its applicability remains limited to clock tree topology. In this dissertation, multi-voltage clocking with a clock mesh topology is investigated in order to address a missing aspect in the current IC design flows. Practical considerations of the current IC design flows are also investigated in this dissertation to expand the applicability of the proposed CNS flow. A novel methodology is introduced to facilitate clock gating within low swing clocking. The applicability of low swing clocking to FinFET technology, which is currently the industry norm, is shown to be effective.Ph.D., Electrical Engineering -- Drexel University, 201

    Clock Tree and Flip-flop Co-optimization for Reducing Power Consumption and Power/Ground Noise of Integrated Circuits and Systems

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 8. ๊น€ํƒœํ™˜.For very-large-scale integration (VLSI) circuits, the activation of all flip-flops that are used to store data is synchronized by clock signals delivered through clock networks. Due to very high frequency of clock signal switches, the dynamic power consumed on clock networks takes a considerable portion of the total power consumption of the circuits. In addition, the largest amount of power consumption in the clock networks comes from the flip-flops and the buffers that drive the flip-flops at the clock network boundary. In addition, the requirement of simultaneously activating all flip-flops for synchronous circuits induces a high peak power/ground noise (i.e., voltage drop) at the clock boundary. In this regards, this thesis addresses two new problems: the problem of reducing the clock power consumption at the clock network boundary, and the problem of reducing the peak current at the clock network boundary. Unlike the prior works which have considered the optimization of flip-flops and clock buffers separately, our approach takes into account the co-optimization of flip-flops and clock buffers. Precisely, we propose four different types of hardware component that can implement a set of flip-flops and their driving buffer as a single unit. The key idea for the derivation of the four types of clock boundary component is that one of the inverters in the driving buffer and one of the inverters in each flip-flop can be combined and removed without changing the functionality of the flip-flops. Consequently, we have a more freedom to select (i.e., allocate) clock boundary components that is able to reduce the power consumption or peak current under timing constraint. We have implemented our approach of clock boundary optimization under bounded clock skew constraint and tested it with ISCAS 89 benchmark circuits. The experimental results confirm that our approach is able to reduce the clock power consumption by 7.9โˆผ10.2% and power/ground noise by 27.7%โˆผ30.9% on average.Chapter 1 Introduction 1 1.1 Clock Signal 1 1.2 Metrics of Clock Design 2 1.3 Clock Network Topologies 4 1.4 Multibit Flip-flop 5 1.5 Simultaneous Switching Noise 6 1.6 Contributions of This Dissertation 6 Chapter 2 Clock Tree and Flip-flop Co-optimization for Reducing Power Consumption 8 2.1 Introduction 8 2.2 Types of Boundary Optimization 9 2.3 Analysis of Four Types of Flip-flop 12 2.3.1 Internal Power Comparison 12 2.3.2 Characterization of Power Consumption 14 2.4 Problem Formulation 15 2.5 The Proposed Algorithm 17 2.5.1 Independence Assumption 17 2.5.2 BoundaryMin Algorithm 17 2.6 Experimental Results 29 2.6.1 Experimental Setup 29 2.6.2 Clock Tree Boundary Optimization Results 33 2.6.3 Capacitance Analysis on Flip-flops 38 2.6.4 Slew and Skew Analysis 39 2.6.5 Window Width Analysis 39 2.7 Conclusions 41 Chapter 3 Clock Tree and Flip-flop Co-optimization for Reducing Power/Ground Noise 42 3.1 Introduction 42 3.2 Current Characteristic of Four Types of Flip-flop 45 3.3 Motivational Example 47 3.4 Problem Formulation 52 3.5 Proposed Algorithm 54 3.5.1 An Overview 54 3.5.2 Superposition of Current Flows 55 3.5.3 Formulation to Instance of MOSP Problem 57 3.5.4 Selecting Target Power Grid Points 59 3.5.5 Consideration of Reducing Power Consumption 62 3.6 Experimental Results 62 3.7 Summary 65 Chapter 4 Conclusion 68 4.1 Clock Buffer and Flip-flop Co-optimization for Reducing Power Consumption 68 4.2 Clock Buffer and Flip-flop Co-optimization for Reducing Power/Ground Noise 69 ์ดˆ๋ก 78Docto

    Clock tree synthesis for prescribed skew specifications

    Get PDF
    In ultra-deep submicron VLSI designs, clock network layout plays an increasingly important role in determining circuit performance including timing, power consumption, cost, power supply noise and tolerance to process variations. It is required that a clock layout algorithm can achieve any prescribed skews with the minimum wire length and acceptable slew rate. Traditional zero-skew clock routing methods are not adequate to address this demand, since they tend to yield excessive wire length for prescribed skew targets. The interactions among skew targets, sink location proximities and capacitive load balance are analyzed. Based on this analysis, a maximum delay-target ordering merging scheme is suggested to minimize wire and buffer area, which results in lesser cost, power consumption and vulnerability to process variations. During the clock routing, buffers are inserted simultaneously to facilitate a proper slew rate level and reduce wire snaking. The proposed algorithm is simple and fast for practical applications. Experimental results on benchmark circuits show that the algorithm can reduce the total wire and buffer capacitance by 60% over an extension of the existing zero-skew routing method

    Power Reductions with Energy Recovery Using Resonant Topologies

    Get PDF
    The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3ร— the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

    ์ €์ „๋ ฅ ๊ณ ์„ฑ๋Šฅ ๋””์ง€ํ„ธ ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ๊ณ ์‹ ๋ขฐ๋„์˜ ํด๋Ÿญ ๋„คํŠธ์›Œํฌ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2015. 8. ๊น€ํƒœํ™˜.์˜ค๋Š˜๋‚ ์˜ ํšŒ๋กœ ์„ค๊ณ„์—์„œ ๊ณต์ •๋ณ€์ด๊ฐ€ ํšŒ๋กœ ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ์˜ ๋ณ€์ด์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์€ ๋งค์šฐ ์ปค์ง์— ๋”ฐ๋ผ, ์ „ํ†ต์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋˜ ํด๋Ÿญ ํŠธ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ํด๋Ÿญ ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ํ•œ๊ณ„์— ๋ถ€๋”ชํžˆ๊ฒŒ ๋˜์—ˆ๊ณ , ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ธฐ์ˆ ๋“ค์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ณ€์ด์— ๊ฐ•ํ•œ ํด๋Ÿญ ๋„คํŠธ์›Œํฌ๋ฅผ ์„ค๊ณ„ํ•˜๊ธฐ ์œ„ํ•ด, ์—ฐ๊ตฌ ๋ฐ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š” ์„ธ ๊ฐ€์ง€ ๊ธฐ์ˆ ์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•˜๊ณ , ์ด๋“ค์„ ๊ฐœ์„ ํ•œ ์—ฐ๊ตฌ๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ์งธ๋กœ, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ ๋ฌธ์ œ๋ฅผ ํšŒ๋กœ ์ œ์ž‘ ์ดํ›„ ๋‹จ๊ณ„์—์„œ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ํฌ์ŠคํŠธ ์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ํด๋Ÿญ ๋ฒ„ํผ๋ฅผ ๋ฐฐ์น˜ํ•˜๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•ด ์„œ์ˆ ํ•œ๋‹ค. ํฌ์ŠคํŠธ ์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ๋ฒ„ํผ๋Š” ํด๋Ÿญ์˜ ์ง€์—ฐ์‹œ๊ฐ„์„ ํšŒ๋กœ๊ฐ€ ์ œ์ž‘๋œ ์ดํ›„์˜ ๋‹จ๊ณ„์—์„œ ์กฐ์ •ํ•˜ ์—ฌ ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ฒ„ํผ ์ž์ฒด์˜ ํฌ๊ธฐ ๋•Œ๋ฌธ์— ์ตœ์†Œํ•œ์˜ ๊ฐœ์ˆ˜๋งŒ ๊ฐ€์žฅ ํšจ์œจ์ ์ธ ์œ„์น˜์— ๋ฐฐ์น˜ํ•ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด์ „์˜ ์—ฐ๊ตฌ๊ฐ€ ํšŒ๋กœ์˜ ์ˆ˜์œจ์„ ๊ณ„์‚ฐํ•  ๋•Œ ์‹œ๊ฐ„์ด ๋งŽ์ด ๊ฑธ๋ฆฌ๋Š” ๋ชฌํ…Œ-์นด๋ฅผ๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํƒ์ƒ‰ ๊ฐ€๋Šฅํ•œ ํฌ์ŠคํŠธ ์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ๋ฒ„ํผ์˜ ๋ฐฐ์น˜๊ฐ€ ์ œํ•œ๋˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Œ์„ ์ง€์ ํ•œ ํ›„, ๊ธฐ์กด์— ์ œ์•ˆ๋˜์—ˆ๋˜ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ํšŒ๋กœ ์ˆ˜์œจ ๊ณ„์‚ฐ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ํšจ์œจ์ ์ธ ํฌ์ŠคํŠธ ์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ๋ฒ„ํผ ๋ฐฐ์น˜๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋Š” ์ ์ง„์ ์ด๊ณ  ์ฒด๊ณ„์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ๋‹ค์Œ์€ ํด๋Ÿญ ์‹œ์ฐจ ์Šค์ผ€์ฅด๋ง ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋ฅผ ์„œ์ˆ ํ•œ๋‹ค. ์ตœ๊ทผ์˜ ์—ฐ๊ตฌ์—์„œ ์ œ์•ˆ๋˜์—ˆ๋˜, ํ”Œ๋ฆฝ-ํ”Œ๋กญ์˜ ํด๋Ÿญ์—์„œ ์ถœ๋ ฅ๊นŒ์ง€์˜ ๋”œ๋ ˆ์ด๊ฐ€ ํด๋Ÿญ์˜ ์ค€๋น„์‹œ๊ฐ„๊ณผ ์œ ์ง€์‹œ๊ฐ„์— ์˜์กดํ•œ๋‹ค๋Š” ์œ ์—ฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋กญ ํƒ€์ด๋ฐ ๋ชจ๋ธ ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด์˜ ํ”Œ๋ฆฝ-ํ”Œ๋กญ์˜ ํƒ€์ด๋ฐ ํŠน์„ฑ๋“ค์ด ๊ณ ์ •๋œ ๊ฐ’์ด๋ผ๋Š” ๊ฐ€์ •์— ๊ธฐ๋ฐ˜ํ•œ ์ •์  ํƒ€์ด๋ฐ ๋ถ„์„์˜ ์ •ํ™•์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ์ค‘์š”ํ•œ ์—ฐ๊ตฌ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ๊ณ ๋ คํ•˜์—ฌ, ์ด์ „์— ๊ณ ์ „์ ์ธ ํ”Œ๋ฆฝ-ํ”Œ๋กญ ํƒ€์ด๋ฐ ํŠน์„ฑ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ง„ํ–‰๋˜์—ˆ๋˜ ํด๋Ÿญ ์‹œ์ฐจ ์Šค์ผ€์ฅด๋ง์˜ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ์œ ์—ฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋กญ ํƒ€์ด๋ฐ ๋ชจ๋ธ์„ ๊ณ ๋ คํ•˜์—ฌ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ฃผ์–ด์ง„ ํšŒ๋กœ์˜ ์ค€๋น„์‹œ๊ฐ„๊ณผ ์œ ์ง€์‹œ๊ฐ„์˜ ์—ฌ์œ ์‹œ๊ฐ„์„ ๋ฐ˜๋ณต์ ์ด๊ณ  ์ฒด๊ณ„์ ์œผ๋กœ ์ตœ๋Œ€ํ™”ํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ํด๋Ÿญ ์ŠคํŒŒ์ธ ๋„คํŠธ์›Œํฌ์˜ ํ•ฉ์„ฑ์„ ์ž๋™ํ™”ํ•˜๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•ด ์„œ์ˆ ํ•œ๋‹ค. ์ „ํ†ต์ ์ธ ํด๋Ÿญ ํŠธ๋ฆฌ ๊ตฌ์กฐ๊ฐ€ ๊ณต์ •๋ณ€์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ํด๋Ÿญ ๋ฉ”์‰ฌ๋ฅผ ํฌํ•จํ•˜๋Š” ๋‹ค์–‘ํ•œ ๋Œ€์•ˆ์  ๊ตฌ์กฐ๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ํด๋Ÿญ ๋ฉ”์‰ฌ์˜ ๊ฒฝ์šฐ ๊ณต์ •๋ณ€์ด์— ์˜ํ•œ ํด๋Ÿญ ์‹œ์ฐจ๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์—ˆ์ง€๋งŒ ์ด๋ฅผ ์œ„ํ•ด ์™€์ด์–ด๋‚˜ ๋ฒ„ํผ ๋“ฑ์˜ ์ž์›์„ ๋งŽ์ด ์†Œ๋ชจํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๋‘ ๊ตฌ์กฐ์˜ ์ค‘๊ฐ„์  ๊ตฌ์กฐ์—๋Š” ํด๋Ÿญ ํŠธ๋ฆฌ์˜ ๋…ธ๋“œ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ํฌ๋กœ์Šค ๋งํฌ๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ๊ตฌ์กฐ์™€ ํด๋Ÿญ ์ŠคํŒŒ์ธ ๊ตฌ์กฐ๊ฐ€ ์žˆ๋‹ค. ํด๋Ÿญ ํŠธ๋ฆฌ์— ์ ์ง„์ ์ธ ์ˆ˜์ •์„ ๊ฐ€ํ•˜์—ฌ ๋งŒ๋“œ๋Š” ํฌ๋กœ์Šค ๋งํฌ์™€ ๋‹ฌ๋ฆฌ, ํด๋Ÿญ ์ŠคํŒŒ์ธ ๊ตฌ์กฐ๋Š” ํŠธ๋ฆฌ๋‚˜ ์ดํ›„์— ์ œ์•ˆ๋œ ๋ฉ”์‰ฌ์™€๋Š” ์™„์ „ํžˆ ๋ณ„๊ฐœ์˜ ๊ตฌ์กฐ๋กœ, ์ด๋ฅผ ํ•ฉ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ๋งค์šฐ ๋‹ค๋ฅด๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ํด๋Ÿญ ์ŠคํŒŒ์ธ์„ ํ•ฉ์„ฑํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํ•„์ˆ˜์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์œผ๋‚˜, ํ•ฉ์„ฑ ๋ฐฉ๋ฒ•๋ก ์ด๋‚˜ ์ด๋ฅผ ์ž๋™ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๊ด€ํ•œ ์—ฐ๊ตฌ๋Š” ์•„์ง ์—†๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์šฐ์„ , ํด๋Ÿญ-๊ฒŒ์ดํŒ…์„ ์ง€์›ํ•˜๋Š” ํด๋Ÿญ ์ŠคํŒŒ์ธ์„ ์ฃผ์–ด์ง„ ํด๋Ÿญ ์‹œ์ฐจ ๋ฐ ํด๋Ÿญ ์Šฌ๋ฃจ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋ฉด์„œ ์ž์› ๋ฐ ์ „๋ ฅ ์†Œ๋ชจ๋Ÿ‰์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•ด ์„œ์ˆ ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ , ํšŒ๋กœ์—์„œ ์ฃผ์–ด์ง„ ํ”Œ๋ฆฝ-ํ”Œ๋กญ๋“ค์„ ํด๋Ÿญ-๊ฒŒ์ดํŒ… ์กฐ๊ฑด์—์„œ์˜ ์—ฐ๊ด€์„ฑ์„ ๊ณ ๋ คํ•˜๊ณ  ์กฐ์งํ™”ํ•˜์—ฌ ํด๋Ÿญ ์ŠคํŒŒ์ธ์„ ์‚ฝ์ž…ํ•œ ํ›„, ํด๋Ÿญ ์‹œ์ฐจ ๋ฐ ์Šฌ๋ฃจ ์กฐ๊ฑด์„ ๊ณ ๋ คํ•˜์—ฌ ๋ฒ„ํผ๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ์š”์•ฝํ•˜๋ฉด, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํฌ์ŠคํŠธ-์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ํด๋Ÿญ ๋ฒ„ํผ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํ…Œํฌ๋‹‰๊ณผ ํด๋Ÿญ ์‹œ์ฐจ ์Šค์ผ€์ฅด๋ง์„ ์œ ์—ฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋กญ ํƒ€์ด๋ฐ ๋ชจ๋ธ์—์„œ ์ ์šฉํ•˜๋Š” ํ…Œํฌ๋‹‰์„ ์ œ์‹œํ•˜๊ณ , ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ ๋ฌธ์ œ์™€ ์ „๋ ฅ ์†Œ๋ชจ ๋ฌธ์ œ๋ฅผ ํ•œ๋ฒˆ์— ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํด๋Ÿญ ์ŠคํŒŒ์ธ ๋„คํŠธ์›Œํฌ๋ฅผ ํ•ฉ์„ฑํ•˜๋Š” ์ž๋™ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์‹œํ•œ๋‹ค.As the process variation is dominating to cause the clock timing variation among chips to be much large, conventional clock tree based clock network is not able to guarantee the timing constraint of a digital system. To overcome the limitations of traditional clock design techniques, various techniques have been studied. This dissertation addresses three techniques that have been widely used for designing robust clock network and proposes developed methods. First, it is widely accepted that post-silicon tunable (PST) clock buffers can effectively resolve the clock timing violation. Since PST buffers, which can reset the clock delay to flip-flops after the chip is manufactured, impose a non-trivial implementation area and control circuitry, it is very important to minimally allocate PST buffers while satisfying the chip yield constraint. In this dissertation, we (1) develop a graph-based chip yield computation technique which can update yields very efficiently and accurately for incremental PST buffer allocation, based on which we (2) propose a systematic (bottom-up and top-down with refinement) PST buffer allocation algorithm that is able to fully explore the design space of PST buffer allocation. Second, clock skew scheduling is one of the essential steps that must be carefully performed during the design process. This dissertation addresses the clock skew optimization problem integrated with the consideration of the interdependent relation between the setup and hold skews, and clk-to-Q delay of flip-flops, so that the time margin is more accurately and reliably set aside over that of the previous methods, which have never taken the integrated problem into account. Precisely, based on an accurate flexible model of setup skew, hold skew, and clk-to-Q delay, we propose a stepwise clock skew scheduling technique in which at each iteration, the worst slack of setup and hold skews is systematically and incrementally relaxed to maximally extend the time margin. Lastly, clock tree with cross links and clock spine have an intermediate characteristics for skew tolerance and power consumption, compared to clock tree and clock mesh which are two extreme structures of clock network. Unlike the clock tree with links between clock nodes, which is a sort of an incremental modification of the structure of clock tree, clock spine network is a completely separated structure from the structures of tree and mesh. Consequently, it is necessary and essential to develop a synthesis algorithm for clock spines, which will be compatible to the existing synthesis algorithms of clock trees and clock meshes. To this end, this dissertation first addresses the problem of automating the synthesis of clock-gated clock spines with the objective of minimizing total clock power while meeting the clock skew and slew constraints. The key idea of our proposed synthesis algorithm is to identify and group the flip-flops with tight correlation of clock-gating operations together to form a spine while accurately predicting and maintaining clock skew and slew variations through the buffer insertion and stub allocation. In summary, this dissertation presents clock tuning techniques with consideration of post-silicon tuning, flexible flip-flop timing model, and clock-gated clock spine synthesis algorithm.Abstract i Chapter 1 INTRODUCTION 1 1.1 Clock Distribution Network . . . . . . . . . . . . . . . . . . . . . 1 1.2 Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Flexible Flip-flop Timing Model . . . . . . . . . . . . . . . . . . . 3 1.4 Clock Spine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Contributions of This Dissertation . . . . . . . . . . . . . . . . . 6 Chapter 2 POST-SILICON TUNABLE CLOCK BUFFER ALLOCATION BASED ON FAST CHIP YIELD COMPUTATION 8 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Systematic Exploration of PST Buffer Allocation . . . . . . . . . 10 2.2.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 15 2.2.3 Allocation Algorithm . . . . . . . . . . . . . . . . . . . . . 16 2.3 Fast Timing Yield Computation . . . . . . . . . . . . . . . . . . 17 2.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Incremental Yield Computation . . . . . . . . . . . . . . . 22 2.4 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 PST Buffer Configuration Techniques . . . . . . . . . . . . . . . 31 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 3 POST-SILICON TUNING BASED ON FLEXIBLE FLIP-FLOP TIMING 34 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Preliminary and Definitions . . . . . . . . . . . . . . . . . . . . . 40 3.2.1 Flexible Flip-Flop Timing Model . . . . . . . . . . . . . . 40 3.2.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Motivational Examples . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Clock Skew Scheduling for Slack Relaxation Based on Flexible Flip-Flop Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4.1 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4.2 Finding Local Clock Skew Schedule . . . . . . . . . . . . 48 3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Chapter 4 SYNTHESIS FOR POWER-AWARE CLOCK SPINES 61 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Preliminaries and Motivation . . . . . . . . . . . . . . . . . . . . 64 4.2.1 Clock Spine . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.2 Activity Patterns . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.3 Power Computation . . . . . . . . . . . . . . . . . . . . . 67 4.3 Algorithm for Clock Spine Synthesis . . . . . . . . . . . . . . . . 68 4.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 68 4.3.2 Power-Aware Sink Clustering . . . . . . . . . . . . . . . . 70 4.3.3 Spine Relaxation . . . . . . . . . . . . . . . . . . . . . . . 77 4.3.4 Spine Buffer Allocation . . . . . . . . . . . . . . . . . . . 80 4.3.5 Top-Level Tree Construction . . . . . . . . . . . . . . . . 86 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Chapter 5 CONCLUSION 95 5.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Bibliography 97 ์ดˆ๋ก 106Docto
    • โ€ฆ
    corecore