2,497 research outputs found

    Physical Design and Clock Tree Synthesis Methods For A 8-Bit Processor

    Get PDF
    Now days a number of processors are available with a lot kind of feature from different industries. A processor with similar kind of architecture of the current processors only missing the memory stuffs like the RAM and ROM has been designed here with the help of Verilog style of coding. This processor contains architecturally the program counter, instruction register, ALU, ALU latch, General Purpose Registers, control state module, flag registers and the core module containing all the modules. And a test module is designed for testing the processor. After the design of the processor with successful functionality, the processor is synthesized with 180nm technology. The synthesis is performed with the data path optimization like the selection of proper adders and multipliers for timing optimization in the data path while the ALU operations are performed. During synthesis how to take care of the worst negative slack (WNS), how to include the clock gating cells, how to define the cost and path groups etc. have been covered. After the proper synthesis we get the proper net list and the synthesized constraint file for carrying out the physical design. In physical design the steps like floor-planning, partitioning, placement, legalization of the placement, clock tree synthesis, and routing etc. have been performed. At all the stages the static timing analysis is performed for the timing meet of the design for better performance in terms of timing or frequency. Each steps of physical design are discussed with special effort towards the concepts behind the step. Out of all the steps of physical design the clock tree synthesis is performed with some improvement in the performance of the clock tree by creating a symmetrical clock tree and maintaining more common clock paths. A special algorithm has been framed for creating a symmetrical clock tree and thereby making the power consumption of the clock tree low

    Power Reductions with Energy Recovery Using Resonant Topologies

    Get PDF
    The problem of power densities in system-on-chips (SoCs) and processors has become more exacerbated recently, resulting in high cooling costs and reliability issues. One of the largest components of power consumption is the low skew clock distribution network (CDN), driving large load capacitance. This can consume as much as 70% of the total dynamic power that is lost as heat, needing elaborate sensing and cooling mechanisms. To mitigate this, resonant clocking has been utilized in several applications over the past decade. An improved energy recovering reconfigurable generalized series resonance (GSR) solution with all the critical support circuitry is developed in this work. This LC resonant clock driver is shown to save about 50% driver power (\u3e40% overall), on a 22nm process node and has 50% less skew than a non-resonant driver at 2GHz. It can operate down to 0.2GHz to support other energy savings techniques like dynamic voltage and frequency scaling (DVFS). As an example, GSR can be configured for the simpler pulse series resonance (PSR) operation to enable further power saving for double data rate (DDR) applications, by using de-skewing latches instead of flip-flop banks. A PSR based subsystem for 40% savings in clocking power with 40% driver active area reduction xii is demonstrated. This new resonant driver generates tracking pulses at each transition of clock for dual edge operation across DVFS. PSR clocking is designed to drive explicit-pulsed latches with negative setup time. Simulations using 45nm IBM/PTM device and interconnect technology models, clocking 1024 flip-flops show the reductions, compared to non-resonant clocking. DVFS range from 2GHz/1.3V to 200MHz/0.5V is obtained. The PSR frequency is set \u3e3ร— the clock rate, needing only 1/10th the inductance of prior-art LC resonance schemes. The skew reductions are achieved without needing to increase the interconnect widths owing to negative set-up times. Applications in data circuits are shown as well with a 90nm example. Parallel resonant and split-driver non-resonant configurations as well are derived from GSR. Tradeoffs in timing performance versus power, based on theoretical analysis, are compared for the first time and verified. This enables synthesis of an optimal topology for a given application from the GSR

    Timing Measurement Platform for Arbitrary Black-Box Circuits Based on Transition Probability

    No full text

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    High-performance and Low-power Clock Network Synthesis in the Presence of Variation.

    Full text link
    Semiconductor technology scaling requires continuous evolution of all aspects of physical design of integrated circuits. Among the major design steps, clock-network synthesis has been greatly affected by technology scaling, rendering existing methodologies inadequate. Clock routing was previously sufficient for smaller ICs, but design difficulty and structural complexity have greatly increased as interconnect delay and clock frequency increased in the 1990s. Since a clock network directly influences IC performance and often consumes a substantial portion of total power, both academia and industry developed synthesis methodologies to achieve low skew, low power and robustness from PVT variations. Nevertheless, clock network synthesis under tight constraints is currently the least automated step in physical design and requires significant manual intervention, undermining turn-around-time. The need for multi-objective optimization over a large parameter space and the increasing impact of process variation make clock network synthesis particularly challenging. Our work identifies new objectives, constraints and concerns in the clock-network synthesis for systems-on-chips and microprocessors. To address them, we generate novel clock-network structures and propose changes in traditional physical-design flows. We develop new modeling techniques and algorithms for clock power optimization subject to tight skew constraints in the presence of process variations. In particular, we offer SPICE-accurate optimizations of clock networks, coordinated to reduce nominal skew below 5 ps, satisfy slew constraints and trade-off skew, insertion delay and power, while tolerating variations. To broaden the scope of clock-network-synthesis optimizations, we propose new techniques and a methodology to reduce dynamic power consumption by 6.8%-11.6% for large IC designs with macro blocks by integrating clock network synthesis within global placement. We also present a novel non-tree topology that is 2.3x more power-efficient than mesh structures. We fuse several clock trees to create large-scale redundancy in a clock network to bridge the gap between tree-like and mesh-like topologies. Integrated optimization techniques for high-quality clock networks described in this dissertation strong empirical results in experiments with recent industry-released benchmarks in the presence of process variation. Our software implementations were recognized with the first-place awards at the ISPD 2009 and ISPD 2010 Clock-Network Synthesis Contests organized by IBM Research and Intel Research.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89711/1/ejdjsy_1.pd

    Elastic bundles :modelling and architecting asynchronous circuits with granular rigidity

    Get PDF
    PhD ThesisIntegrated Circuit (IC) designs these days are predominantly System-on-Chips (SoCs). The complexity of designing a SoC has increased rapidly over the years due to growing process and environmental variations coupled with global clock distribution di culty. Moreover, traditional synchronous design is not apt to handle the heterogeneous timing nature of modern SoCs. As a countermeasure, the semiconductor industry witnessed a strong revival of asynchronous design principles. A new paradigm of digital circuits emerged, as a result, namely mixed synchronous-asynchronous circuits. With a wave of recent innovations in synchronous-asynchronous CAD integration, this paradigm is showing signs of commercial adoption in future SoCs mainly due to the scope for reuse of synchronous functional blocks and IP cores, and the co-existence of synchronous and asynchronous design styles in a common EDA framework. However, there is a lack of formal methods and tools to facilitate mixed synchronousasynchronous design. In this thesis, we propose a formal model based on Petri nets with step semantics to describe these circuits behaviourally. Implication of this model in the veri cation and synthesis of mixed synchronous-asynchronous circuits is studied. Till date, this paradigm has been mainly explored on the basis of Globally Asynchronous Locally Synchronous (GALS) systems. Despite decades of research, GALS design has failed to gain traction commercially. To understand its drawbacks, a simulation framework characterising the physical and functional aspects of GALS SoCs is presented. A novel method for synthesising mixed synchronous-asynchronous circuits with varying levels of rigidity is proposed. Starting with a high-level data ow model of a system which is intrinsically asynchronous, the key idea is to introduce rigidity of chosen granularity levels in the model without changing functional behaviour. The system is then partitioned into functional blocks of synchronous and asynchronous elements before being transformed into an equivalent circuit which can be synthesised using standard EDA tools

    Course grained low power design flow using UPF

    Get PDF
    Increased system complexity has led to the substitution of the traditional bottom-up design flow by systematic hierarchical design flow. The main motivation behind the evolution of such an approach is the increasing difficulty in hardware realization of complex systems. With decreasing channel lengths, few key problems such as timing closure, design sign-off, routing complexity, signal integrity, and power dissipation arise in the design flows. Specifically, minimizing power dissipation is critical in several high-end processors. In high-end processors, the design complexity contributes to the overall dynamic power while the decreasing transistor size results in static power dissipation. This research aims at optimizing the design flow for power and timing using the unified power format (UPF). UPF provides a strategic format to specify power-aware design information at every stage in the flow. The low power reduction techniques enforced in this research are multi-voltage, multi-threshold voltage (Vth), and power gating with state retention. An inherent design challenge addressed in this research is the choice of power optimization techniques as the flow advances from synthesis to physical design. A top-down digital design flow for a 32 bit MIPS RISC processor has been implemented with and without UPF synthesis flow for 65nm technology. The UPF synthesis is implemented with two voltages, 1.08V and 0.864V (Multi-VDD). Area, power and timing metrics are analyzed for the flows developed. Power savings of about 20 % are achieved in the design flow with \u27multi-threshold\u27 power technique compared to that of the design flow with no low power techniques employed. Similarly, 30 % power savings are achieved in the design flow with the UPF implemented when compared to that of the design flow with \u27multi-threshold\u27 power technique employed. Thus, a cumulative power savings of 42% has been achieved in a complete power efficient design flow (UPF) compared to that of the generic top-down standard flow with no power saving techniques employed. This is substantiated by the low voltage operation of modules in the design, reduction in clock switching power by gating clocks in the design and extensive use of HVT and LVT standard cells for implementation. The UPF synthesis flow saw the worst timing slack and more area when compared to those of the `multi-threshold\u27 or the generic flow. Percentage increase in the area with UPF is approximately 15%; a significant source for this increase being the additional power controlling logic added

    ์ €์ „๋ ฅ ๊ณ ์„ฑ๋Šฅ ๋””์ง€ํ„ธ ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ๊ณ ์‹ ๋ขฐ๋„์˜ ํด๋Ÿญ ๋„คํŠธ์›Œํฌ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2015. 8. ๊น€ํƒœํ™˜.์˜ค๋Š˜๋‚ ์˜ ํšŒ๋กœ ์„ค๊ณ„์—์„œ ๊ณต์ •๋ณ€์ด๊ฐ€ ํšŒ๋กœ ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ์˜ ๋ณ€์ด์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์€ ๋งค์šฐ ์ปค์ง์— ๋”ฐ๋ผ, ์ „ํ†ต์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋˜ ํด๋Ÿญ ํŠธ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ํด๋Ÿญ ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ํ•œ๊ณ„์— ๋ถ€๋”ชํžˆ๊ฒŒ ๋˜์—ˆ๊ณ , ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ธฐ์ˆ ๋“ค์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ณ€์ด์— ๊ฐ•ํ•œ ํด๋Ÿญ ๋„คํŠธ์›Œํฌ๋ฅผ ์„ค๊ณ„ํ•˜๊ธฐ ์œ„ํ•ด, ์—ฐ๊ตฌ ๋ฐ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š” ์„ธ ๊ฐ€์ง€ ๊ธฐ์ˆ ์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•˜๊ณ , ์ด๋“ค์„ ๊ฐœ์„ ํ•œ ์—ฐ๊ตฌ๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ์งธ๋กœ, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ ๋ฌธ์ œ๋ฅผ ํšŒ๋กœ ์ œ์ž‘ ์ดํ›„ ๋‹จ๊ณ„์—์„œ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ํฌ์ŠคํŠธ ์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ํด๋Ÿญ ๋ฒ„ํผ๋ฅผ ๋ฐฐ์น˜ํ•˜๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•ด ์„œ์ˆ ํ•œ๋‹ค. ํฌ์ŠคํŠธ ์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ๋ฒ„ํผ๋Š” ํด๋Ÿญ์˜ ์ง€์—ฐ์‹œ๊ฐ„์„ ํšŒ๋กœ๊ฐ€ ์ œ์ž‘๋œ ์ดํ›„์˜ ๋‹จ๊ณ„์—์„œ ์กฐ์ •ํ•˜ ์—ฌ ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ฒ„ํผ ์ž์ฒด์˜ ํฌ๊ธฐ ๋•Œ๋ฌธ์— ์ตœ์†Œํ•œ์˜ ๊ฐœ์ˆ˜๋งŒ ๊ฐ€์žฅ ํšจ์œจ์ ์ธ ์œ„์น˜์— ๋ฐฐ์น˜ํ•ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด์ „์˜ ์—ฐ๊ตฌ๊ฐ€ ํšŒ๋กœ์˜ ์ˆ˜์œจ์„ ๊ณ„์‚ฐํ•  ๋•Œ ์‹œ๊ฐ„์ด ๋งŽ์ด ๊ฑธ๋ฆฌ๋Š” ๋ชฌํ…Œ-์นด๋ฅผ๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํƒ์ƒ‰ ๊ฐ€๋Šฅํ•œ ํฌ์ŠคํŠธ ์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ๋ฒ„ํผ์˜ ๋ฐฐ์น˜๊ฐ€ ์ œํ•œ๋˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Œ์„ ์ง€์ ํ•œ ํ›„, ๊ธฐ์กด์— ์ œ์•ˆ๋˜์—ˆ๋˜ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ํšŒ๋กœ ์ˆ˜์œจ ๊ณ„์‚ฐ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ํšจ์œจ์ ์ธ ํฌ์ŠคํŠธ ์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ๋ฒ„ํผ ๋ฐฐ์น˜๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋Š” ์ ์ง„์ ์ด๊ณ  ์ฒด๊ณ„์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ๋‹ค์Œ์€ ํด๋Ÿญ ์‹œ์ฐจ ์Šค์ผ€์ฅด๋ง ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋ฅผ ์„œ์ˆ ํ•œ๋‹ค. ์ตœ๊ทผ์˜ ์—ฐ๊ตฌ์—์„œ ์ œ์•ˆ๋˜์—ˆ๋˜, ํ”Œ๋ฆฝ-ํ”Œ๋กญ์˜ ํด๋Ÿญ์—์„œ ์ถœ๋ ฅ๊นŒ์ง€์˜ ๋”œ๋ ˆ์ด๊ฐ€ ํด๋Ÿญ์˜ ์ค€๋น„์‹œ๊ฐ„๊ณผ ์œ ์ง€์‹œ๊ฐ„์— ์˜์กดํ•œ๋‹ค๋Š” ์œ ์—ฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋กญ ํƒ€์ด๋ฐ ๋ชจ๋ธ ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด์˜ ํ”Œ๋ฆฝ-ํ”Œ๋กญ์˜ ํƒ€์ด๋ฐ ํŠน์„ฑ๋“ค์ด ๊ณ ์ •๋œ ๊ฐ’์ด๋ผ๋Š” ๊ฐ€์ •์— ๊ธฐ๋ฐ˜ํ•œ ์ •์  ํƒ€์ด๋ฐ ๋ถ„์„์˜ ์ •ํ™•์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ์ค‘์š”ํ•œ ์—ฐ๊ตฌ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ๊ณ ๋ คํ•˜์—ฌ, ์ด์ „์— ๊ณ ์ „์ ์ธ ํ”Œ๋ฆฝ-ํ”Œ๋กญ ํƒ€์ด๋ฐ ํŠน์„ฑ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ง„ํ–‰๋˜์—ˆ๋˜ ํด๋Ÿญ ์‹œ์ฐจ ์Šค์ผ€์ฅด๋ง์˜ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ์œ ์—ฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋กญ ํƒ€์ด๋ฐ ๋ชจ๋ธ์„ ๊ณ ๋ คํ•˜์—ฌ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ฃผ์–ด์ง„ ํšŒ๋กœ์˜ ์ค€๋น„์‹œ๊ฐ„๊ณผ ์œ ์ง€์‹œ๊ฐ„์˜ ์—ฌ์œ ์‹œ๊ฐ„์„ ๋ฐ˜๋ณต์ ์ด๊ณ  ์ฒด๊ณ„์ ์œผ๋กœ ์ตœ๋Œ€ํ™”ํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ํด๋Ÿญ ์ŠคํŒŒ์ธ ๋„คํŠธ์›Œํฌ์˜ ํ•ฉ์„ฑ์„ ์ž๋™ํ™”ํ•˜๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•ด ์„œ์ˆ ํ•œ๋‹ค. ์ „ํ†ต์ ์ธ ํด๋Ÿญ ํŠธ๋ฆฌ ๊ตฌ์กฐ๊ฐ€ ๊ณต์ •๋ณ€์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ํด๋Ÿญ ๋ฉ”์‰ฌ๋ฅผ ํฌํ•จํ•˜๋Š” ๋‹ค์–‘ํ•œ ๋Œ€์•ˆ์  ๊ตฌ์กฐ๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ํด๋Ÿญ ๋ฉ”์‰ฌ์˜ ๊ฒฝ์šฐ ๊ณต์ •๋ณ€์ด์— ์˜ํ•œ ํด๋Ÿญ ์‹œ์ฐจ๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์—ˆ์ง€๋งŒ ์ด๋ฅผ ์œ„ํ•ด ์™€์ด์–ด๋‚˜ ๋ฒ„ํผ ๋“ฑ์˜ ์ž์›์„ ๋งŽ์ด ์†Œ๋ชจํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๋‘ ๊ตฌ์กฐ์˜ ์ค‘๊ฐ„์  ๊ตฌ์กฐ์—๋Š” ํด๋Ÿญ ํŠธ๋ฆฌ์˜ ๋…ธ๋“œ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ํฌ๋กœ์Šค ๋งํฌ๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ๊ตฌ์กฐ์™€ ํด๋Ÿญ ์ŠคํŒŒ์ธ ๊ตฌ์กฐ๊ฐ€ ์žˆ๋‹ค. ํด๋Ÿญ ํŠธ๋ฆฌ์— ์ ์ง„์ ์ธ ์ˆ˜์ •์„ ๊ฐ€ํ•˜์—ฌ ๋งŒ๋“œ๋Š” ํฌ๋กœ์Šค ๋งํฌ์™€ ๋‹ฌ๋ฆฌ, ํด๋Ÿญ ์ŠคํŒŒ์ธ ๊ตฌ์กฐ๋Š” ํŠธ๋ฆฌ๋‚˜ ์ดํ›„์— ์ œ์•ˆ๋œ ๋ฉ”์‰ฌ์™€๋Š” ์™„์ „ํžˆ ๋ณ„๊ฐœ์˜ ๊ตฌ์กฐ๋กœ, ์ด๋ฅผ ํ•ฉ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ๋งค์šฐ ๋‹ค๋ฅด๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ํด๋Ÿญ ์ŠคํŒŒ์ธ์„ ํ•ฉ์„ฑํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํ•„์ˆ˜์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์œผ๋‚˜, ํ•ฉ์„ฑ ๋ฐฉ๋ฒ•๋ก ์ด๋‚˜ ์ด๋ฅผ ์ž๋™ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๊ด€ํ•œ ์—ฐ๊ตฌ๋Š” ์•„์ง ์—†๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์šฐ์„ , ํด๋Ÿญ-๊ฒŒ์ดํŒ…์„ ์ง€์›ํ•˜๋Š” ํด๋Ÿญ ์ŠคํŒŒ์ธ์„ ์ฃผ์–ด์ง„ ํด๋Ÿญ ์‹œ์ฐจ ๋ฐ ํด๋Ÿญ ์Šฌ๋ฃจ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋ฉด์„œ ์ž์› ๋ฐ ์ „๋ ฅ ์†Œ๋ชจ๋Ÿ‰์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•ด ์„œ์ˆ ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ , ํšŒ๋กœ์—์„œ ์ฃผ์–ด์ง„ ํ”Œ๋ฆฝ-ํ”Œ๋กญ๋“ค์„ ํด๋Ÿญ-๊ฒŒ์ดํŒ… ์กฐ๊ฑด์—์„œ์˜ ์—ฐ๊ด€์„ฑ์„ ๊ณ ๋ คํ•˜๊ณ  ์กฐ์งํ™”ํ•˜์—ฌ ํด๋Ÿญ ์ŠคํŒŒ์ธ์„ ์‚ฝ์ž…ํ•œ ํ›„, ํด๋Ÿญ ์‹œ์ฐจ ๋ฐ ์Šฌ๋ฃจ ์กฐ๊ฑด์„ ๊ณ ๋ คํ•˜์—ฌ ๋ฒ„ํผ๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ์š”์•ฝํ•˜๋ฉด, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํฌ์ŠคํŠธ-์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ํด๋Ÿญ ๋ฒ„ํผ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํ…Œํฌ๋‹‰๊ณผ ํด๋Ÿญ ์‹œ์ฐจ ์Šค์ผ€์ฅด๋ง์„ ์œ ์—ฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋กญ ํƒ€์ด๋ฐ ๋ชจ๋ธ์—์„œ ์ ์šฉํ•˜๋Š” ํ…Œํฌ๋‹‰์„ ์ œ์‹œํ•˜๊ณ , ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ ๋ฌธ์ œ์™€ ์ „๋ ฅ ์†Œ๋ชจ ๋ฌธ์ œ๋ฅผ ํ•œ๋ฒˆ์— ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํด๋Ÿญ ์ŠคํŒŒ์ธ ๋„คํŠธ์›Œํฌ๋ฅผ ํ•ฉ์„ฑํ•˜๋Š” ์ž๋™ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์‹œํ•œ๋‹ค.As the process variation is dominating to cause the clock timing variation among chips to be much large, conventional clock tree based clock network is not able to guarantee the timing constraint of a digital system. To overcome the limitations of traditional clock design techniques, various techniques have been studied. This dissertation addresses three techniques that have been widely used for designing robust clock network and proposes developed methods. First, it is widely accepted that post-silicon tunable (PST) clock buffers can effectively resolve the clock timing violation. Since PST buffers, which can reset the clock delay to flip-flops after the chip is manufactured, impose a non-trivial implementation area and control circuitry, it is very important to minimally allocate PST buffers while satisfying the chip yield constraint. In this dissertation, we (1) develop a graph-based chip yield computation technique which can update yields very efficiently and accurately for incremental PST buffer allocation, based on which we (2) propose a systematic (bottom-up and top-down with refinement) PST buffer allocation algorithm that is able to fully explore the design space of PST buffer allocation. Second, clock skew scheduling is one of the essential steps that must be carefully performed during the design process. This dissertation addresses the clock skew optimization problem integrated with the consideration of the interdependent relation between the setup and hold skews, and clk-to-Q delay of flip-flops, so that the time margin is more accurately and reliably set aside over that of the previous methods, which have never taken the integrated problem into account. Precisely, based on an accurate flexible model of setup skew, hold skew, and clk-to-Q delay, we propose a stepwise clock skew scheduling technique in which at each iteration, the worst slack of setup and hold skews is systematically and incrementally relaxed to maximally extend the time margin. Lastly, clock tree with cross links and clock spine have an intermediate characteristics for skew tolerance and power consumption, compared to clock tree and clock mesh which are two extreme structures of clock network. Unlike the clock tree with links between clock nodes, which is a sort of an incremental modification of the structure of clock tree, clock spine network is a completely separated structure from the structures of tree and mesh. Consequently, it is necessary and essential to develop a synthesis algorithm for clock spines, which will be compatible to the existing synthesis algorithms of clock trees and clock meshes. To this end, this dissertation first addresses the problem of automating the synthesis of clock-gated clock spines with the objective of minimizing total clock power while meeting the clock skew and slew constraints. The key idea of our proposed synthesis algorithm is to identify and group the flip-flops with tight correlation of clock-gating operations together to form a spine while accurately predicting and maintaining clock skew and slew variations through the buffer insertion and stub allocation. In summary, this dissertation presents clock tuning techniques with consideration of post-silicon tuning, flexible flip-flop timing model, and clock-gated clock spine synthesis algorithm.Abstract i Chapter 1 INTRODUCTION 1 1.1 Clock Distribution Network . . . . . . . . . . . . . . . . . . . . . 1 1.2 Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Flexible Flip-flop Timing Model . . . . . . . . . . . . . . . . . . . 3 1.4 Clock Spine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Contributions of This Dissertation . . . . . . . . . . . . . . . . . 6 Chapter 2 POST-SILICON TUNABLE CLOCK BUFFER ALLOCATION BASED ON FAST CHIP YIELD COMPUTATION 8 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Systematic Exploration of PST Buffer Allocation . . . . . . . . . 10 2.2.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 15 2.2.3 Allocation Algorithm . . . . . . . . . . . . . . . . . . . . . 16 2.3 Fast Timing Yield Computation . . . . . . . . . . . . . . . . . . 17 2.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Incremental Yield Computation . . . . . . . . . . . . . . . 22 2.4 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 PST Buffer Configuration Techniques . . . . . . . . . . . . . . . 31 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 3 POST-SILICON TUNING BASED ON FLEXIBLE FLIP-FLOP TIMING 34 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Preliminary and Definitions . . . . . . . . . . . . . . . . . . . . . 40 3.2.1 Flexible Flip-Flop Timing Model . . . . . . . . . . . . . . 40 3.2.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Motivational Examples . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Clock Skew Scheduling for Slack Relaxation Based on Flexible Flip-Flop Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4.1 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4.2 Finding Local Clock Skew Schedule . . . . . . . . . . . . 48 3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Chapter 4 SYNTHESIS FOR POWER-AWARE CLOCK SPINES 61 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Preliminaries and Motivation . . . . . . . . . . . . . . . . . . . . 64 4.2.1 Clock Spine . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.2 Activity Patterns . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.3 Power Computation . . . . . . . . . . . . . . . . . . . . . 67 4.3 Algorithm for Clock Spine Synthesis . . . . . . . . . . . . . . . . 68 4.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 68 4.3.2 Power-Aware Sink Clustering . . . . . . . . . . . . . . . . 70 4.3.3 Spine Relaxation . . . . . . . . . . . . . . . . . . . . . . . 77 4.3.4 Spine Buffer Allocation . . . . . . . . . . . . . . . . . . . 80 4.3.5 Top-Level Tree Construction . . . . . . . . . . . . . . . . 86 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Chapter 5 CONCLUSION 95 5.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Bibliography 97 ์ดˆ๋ก 106Docto

    Physical design of USB1.1

    Get PDF
    In earlier days, interfacing peripheral devices to host computer has a big problematic. There existed so many different kindsโ€™ ports like serial port, parallel port, PS/2 etc. And their use restricts many situations, Such as no hot-pluggability and involuntary configuration. There are very less number of methods to connect the peripheral devices to host computer. The main reason that Universal Serial Bus was implemented to provide an additional benefits compared to earlier interfacing ports. USB is designed to allow many peripheral be connecting using single standardize interface. It provides an expandable fast, cost effective, hot-pluggable plug and play serial hardware interface that makes life of computer user easier allowing them to plug different devices to into USB port and have them configured automatically. In this thesis demonstrated the USB v1.1 architecture part in briefly and generated gate level net list form RTL code by applying the different constraints like timing, area and power. By applying the various types design constraints so that the performance was improved by 30%. And then it implemented in physically by using SoC encounter EDI system, estimation of chip size, power analysis and routing the clock signal to all flip-flops presented in the design. To reduce the clock switching power implemented register clustering algorithm (DBSCAN). In this design implementation TSMC 180nm technology library is used
    • โ€ฆ
    corecore