247 research outputs found

    Synthesis of Clock Trees with Useful Skew based on Sparse-Graph Algorithms

    Get PDF
    Computer-aided design (CAD) for very large scale integration (VLSI) involve

    Timing Closure in Chip Design

    Get PDF
    Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips

    ์ €์ „๋ ฅ ๊ณ ์„ฑ๋Šฅ ๋””์ง€ํ„ธ ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ๊ณ ์‹ ๋ขฐ๋„์˜ ํด๋Ÿญ ๋„คํŠธ์›Œํฌ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2015. 8. ๊น€ํƒœํ™˜.์˜ค๋Š˜๋‚ ์˜ ํšŒ๋กœ ์„ค๊ณ„์—์„œ ๊ณต์ •๋ณ€์ด๊ฐ€ ํšŒ๋กœ ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ์˜ ๋ณ€์ด์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์€ ๋งค์šฐ ์ปค์ง์— ๋”ฐ๋ผ, ์ „ํ†ต์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋˜ ํด๋Ÿญ ํŠธ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ํด๋Ÿญ ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ํ•œ๊ณ„์— ๋ถ€๋”ชํžˆ๊ฒŒ ๋˜์—ˆ๊ณ , ์ด๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ธฐ์ˆ ๋“ค์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ณ€์ด์— ๊ฐ•ํ•œ ํด๋Ÿญ ๋„คํŠธ์›Œํฌ๋ฅผ ์„ค๊ณ„ํ•˜๊ธฐ ์œ„ํ•ด, ์—ฐ๊ตฌ ๋ฐ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋Š” ์„ธ ๊ฐ€์ง€ ๊ธฐ์ˆ ์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•˜๊ณ , ์ด๋“ค์„ ๊ฐœ์„ ํ•œ ์—ฐ๊ตฌ๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ์งธ๋กœ, ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ ๋ฌธ์ œ๋ฅผ ํšŒ๋กœ ์ œ์ž‘ ์ดํ›„ ๋‹จ๊ณ„์—์„œ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ๋Š” ํฌ์ŠคํŠธ ์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ํด๋Ÿญ ๋ฒ„ํผ๋ฅผ ๋ฐฐ์น˜ํ•˜๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•ด ์„œ์ˆ ํ•œ๋‹ค. ํฌ์ŠคํŠธ ์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ๋ฒ„ํผ๋Š” ํด๋Ÿญ์˜ ์ง€์—ฐ์‹œ๊ฐ„์„ ํšŒ๋กœ๊ฐ€ ์ œ์ž‘๋œ ์ดํ›„์˜ ๋‹จ๊ณ„์—์„œ ์กฐ์ •ํ•˜ ์—ฌ ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๋ฒ„ํผ ์ž์ฒด์˜ ํฌ๊ธฐ ๋•Œ๋ฌธ์— ์ตœ์†Œํ•œ์˜ ๊ฐœ์ˆ˜๋งŒ ๊ฐ€์žฅ ํšจ์œจ์ ์ธ ์œ„์น˜์— ๋ฐฐ์น˜ํ•ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด์ „์˜ ์—ฐ๊ตฌ๊ฐ€ ํšŒ๋กœ์˜ ์ˆ˜์œจ์„ ๊ณ„์‚ฐํ•  ๋•Œ ์‹œ๊ฐ„์ด ๋งŽ์ด ๊ฑธ๋ฆฌ๋Š” ๋ชฌํ…Œ-์นด๋ฅผ๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํƒ์ƒ‰ ๊ฐ€๋Šฅํ•œ ํฌ์ŠคํŠธ ์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ๋ฒ„ํผ์˜ ๋ฐฐ์น˜๊ฐ€ ์ œํ•œ๋˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Œ์„ ์ง€์ ํ•œ ํ›„, ๊ธฐ์กด์— ์ œ์•ˆ๋˜์—ˆ๋˜ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ํšŒ๋กœ ์ˆ˜์œจ ๊ณ„์‚ฐ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ํšจ์œจ์ ์ธ ํฌ์ŠคํŠธ ์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ๋ฒ„ํผ ๋ฐฐ์น˜๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ๋Š” ์ ์ง„์ ์ด๊ณ  ์ฒด๊ณ„์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ๋‹ค์Œ์€ ํด๋Ÿญ ์‹œ์ฐจ ์Šค์ผ€์ฅด๋ง ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋ฅผ ์„œ์ˆ ํ•œ๋‹ค. ์ตœ๊ทผ์˜ ์—ฐ๊ตฌ์—์„œ ์ œ์•ˆ๋˜์—ˆ๋˜, ํ”Œ๋ฆฝ-ํ”Œ๋กญ์˜ ํด๋Ÿญ์—์„œ ์ถœ๋ ฅ๊นŒ์ง€์˜ ๋”œ๋ ˆ์ด๊ฐ€ ํด๋Ÿญ์˜ ์ค€๋น„์‹œ๊ฐ„๊ณผ ์œ ์ง€์‹œ๊ฐ„์— ์˜์กดํ•œ๋‹ค๋Š” ์œ ์—ฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋กญ ํƒ€์ด๋ฐ ๋ชจ๋ธ ์—ฐ๊ตฌ๋Š” ๊ธฐ์กด์˜ ํ”Œ๋ฆฝ-ํ”Œ๋กญ์˜ ํƒ€์ด๋ฐ ํŠน์„ฑ๋“ค์ด ๊ณ ์ •๋œ ๊ฐ’์ด๋ผ๋Š” ๊ฐ€์ •์— ๊ธฐ๋ฐ˜ํ•œ ์ •์  ํƒ€์ด๋ฐ ๋ถ„์„์˜ ์ •ํ™•์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ์ค‘์š”ํ•œ ์—ฐ๊ตฌ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ƒˆ๋กœ์šด ๋ชจ๋ธ์„ ๊ณ ๋ คํ•˜์—ฌ, ์ด์ „์— ๊ณ ์ „์ ์ธ ํ”Œ๋ฆฝ-ํ”Œ๋กญ ํƒ€์ด๋ฐ ํŠน์„ฑ ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ง„ํ–‰๋˜์—ˆ๋˜ ํด๋Ÿญ ์‹œ์ฐจ ์Šค์ผ€์ฅด๋ง์˜ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ์œ ์—ฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋กญ ํƒ€์ด๋ฐ ๋ชจ๋ธ์„ ๊ณ ๋ คํ•˜์—ฌ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ฃผ์–ด์ง„ ํšŒ๋กœ์˜ ์ค€๋น„์‹œ๊ฐ„๊ณผ ์œ ์ง€์‹œ๊ฐ„์˜ ์—ฌ์œ ์‹œ๊ฐ„์„ ๋ฐ˜๋ณต์ ์ด๊ณ  ์ฒด๊ณ„์ ์œผ๋กœ ์ตœ๋Œ€ํ™”ํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ํด๋Ÿญ ์ŠคํŒŒ์ธ ๋„คํŠธ์›Œํฌ์˜ ํ•ฉ์„ฑ์„ ์ž๋™ํ™”ํ•˜๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•ด ์„œ์ˆ ํ•œ๋‹ค. ์ „ํ†ต์ ์ธ ํด๋Ÿญ ํŠธ๋ฆฌ ๊ตฌ์กฐ๊ฐ€ ๊ณต์ •๋ณ€์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ํด๋Ÿญ ๋ฉ”์‰ฌ๋ฅผ ํฌํ•จํ•˜๋Š” ๋‹ค์–‘ํ•œ ๋Œ€์•ˆ์  ๊ตฌ์กฐ๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ํด๋Ÿญ ๋ฉ”์‰ฌ์˜ ๊ฒฝ์šฐ ๊ณต์ •๋ณ€์ด์— ์˜ํ•œ ํด๋Ÿญ ์‹œ์ฐจ๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์—ˆ์ง€๋งŒ ์ด๋ฅผ ์œ„ํ•ด ์™€์ด์–ด๋‚˜ ๋ฒ„ํผ ๋“ฑ์˜ ์ž์›์„ ๋งŽ์ด ์†Œ๋ชจํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๋‘ ๊ตฌ์กฐ์˜ ์ค‘๊ฐ„์  ๊ตฌ์กฐ์—๋Š” ํด๋Ÿญ ํŠธ๋ฆฌ์˜ ๋…ธ๋“œ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ํฌ๋กœ์Šค ๋งํฌ๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ๊ตฌ์กฐ์™€ ํด๋Ÿญ ์ŠคํŒŒ์ธ ๊ตฌ์กฐ๊ฐ€ ์žˆ๋‹ค. ํด๋Ÿญ ํŠธ๋ฆฌ์— ์ ์ง„์ ์ธ ์ˆ˜์ •์„ ๊ฐ€ํ•˜์—ฌ ๋งŒ๋“œ๋Š” ํฌ๋กœ์Šค ๋งํฌ์™€ ๋‹ฌ๋ฆฌ, ํด๋Ÿญ ์ŠคํŒŒ์ธ ๊ตฌ์กฐ๋Š” ํŠธ๋ฆฌ๋‚˜ ์ดํ›„์— ์ œ์•ˆ๋œ ๋ฉ”์‰ฌ์™€๋Š” ์™„์ „ํžˆ ๋ณ„๊ฐœ์˜ ๊ตฌ์กฐ๋กœ, ์ด๋ฅผ ํ•ฉ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ๋งค์šฐ ๋‹ค๋ฅด๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ํด๋Ÿญ ์ŠคํŒŒ์ธ์„ ํ•ฉ์„ฑํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํ•„์ˆ˜์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์œผ๋‚˜, ํ•ฉ์„ฑ ๋ฐฉ๋ฒ•๋ก ์ด๋‚˜ ์ด๋ฅผ ์ž๋™ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๊ด€ํ•œ ์—ฐ๊ตฌ๋Š” ์•„์ง ์—†๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์šฐ์„ , ํด๋Ÿญ-๊ฒŒ์ดํŒ…์„ ์ง€์›ํ•˜๋Š” ํด๋Ÿญ ์ŠคํŒŒ์ธ์„ ์ฃผ์–ด์ง„ ํด๋Ÿญ ์‹œ์ฐจ ๋ฐ ํด๋Ÿญ ์Šฌ๋ฃจ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋ฉด์„œ ์ž์› ๋ฐ ์ „๋ ฅ ์†Œ๋ชจ๋Ÿ‰์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•ด ์„œ์ˆ ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ , ํšŒ๋กœ์—์„œ ์ฃผ์–ด์ง„ ํ”Œ๋ฆฝ-ํ”Œ๋กญ๋“ค์„ ํด๋Ÿญ-๊ฒŒ์ดํŒ… ์กฐ๊ฑด์—์„œ์˜ ์—ฐ๊ด€์„ฑ์„ ๊ณ ๋ คํ•˜๊ณ  ์กฐ์งํ™”ํ•˜์—ฌ ํด๋Ÿญ ์ŠคํŒŒ์ธ์„ ์‚ฝ์ž…ํ•œ ํ›„, ํด๋Ÿญ ์‹œ์ฐจ ๋ฐ ์Šฌ๋ฃจ ์กฐ๊ฑด์„ ๊ณ ๋ คํ•˜์—ฌ ๋ฒ„ํผ๋ฅผ ์‚ฝ์ž…ํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ์š”์•ฝํ•˜๋ฉด, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํฌ์ŠคํŠธ-์‹ค๋ฆฌ์ฝ˜ ์กฐ์ • ํด๋Ÿญ ๋ฒ„ํผ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ํ…Œํฌ๋‹‰๊ณผ ํด๋Ÿญ ์‹œ์ฐจ ์Šค์ผ€์ฅด๋ง์„ ์œ ์—ฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋กญ ํƒ€์ด๋ฐ ๋ชจ๋ธ์—์„œ ์ ์šฉํ•˜๋Š” ํ…Œํฌ๋‹‰์„ ์ œ์‹œํ•˜๊ณ , ํด๋Ÿญ์˜ ํƒ€์ด๋ฐ ๋ฌธ์ œ์™€ ์ „๋ ฅ ์†Œ๋ชจ ๋ฌธ์ œ๋ฅผ ํ•œ๋ฒˆ์— ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํด๋Ÿญ ์ŠคํŒŒ์ธ ๋„คํŠธ์›Œํฌ๋ฅผ ํ•ฉ์„ฑํ•˜๋Š” ์ž๋™ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์‹œํ•œ๋‹ค.As the process variation is dominating to cause the clock timing variation among chips to be much large, conventional clock tree based clock network is not able to guarantee the timing constraint of a digital system. To overcome the limitations of traditional clock design techniques, various techniques have been studied. This dissertation addresses three techniques that have been widely used for designing robust clock network and proposes developed methods. First, it is widely accepted that post-silicon tunable (PST) clock buffers can effectively resolve the clock timing violation. Since PST buffers, which can reset the clock delay to flip-flops after the chip is manufactured, impose a non-trivial implementation area and control circuitry, it is very important to minimally allocate PST buffers while satisfying the chip yield constraint. In this dissertation, we (1) develop a graph-based chip yield computation technique which can update yields very efficiently and accurately for incremental PST buffer allocation, based on which we (2) propose a systematic (bottom-up and top-down with refinement) PST buffer allocation algorithm that is able to fully explore the design space of PST buffer allocation. Second, clock skew scheduling is one of the essential steps that must be carefully performed during the design process. This dissertation addresses the clock skew optimization problem integrated with the consideration of the interdependent relation between the setup and hold skews, and clk-to-Q delay of flip-flops, so that the time margin is more accurately and reliably set aside over that of the previous methods, which have never taken the integrated problem into account. Precisely, based on an accurate flexible model of setup skew, hold skew, and clk-to-Q delay, we propose a stepwise clock skew scheduling technique in which at each iteration, the worst slack of setup and hold skews is systematically and incrementally relaxed to maximally extend the time margin. Lastly, clock tree with cross links and clock spine have an intermediate characteristics for skew tolerance and power consumption, compared to clock tree and clock mesh which are two extreme structures of clock network. Unlike the clock tree with links between clock nodes, which is a sort of an incremental modification of the structure of clock tree, clock spine network is a completely separated structure from the structures of tree and mesh. Consequently, it is necessary and essential to develop a synthesis algorithm for clock spines, which will be compatible to the existing synthesis algorithms of clock trees and clock meshes. To this end, this dissertation first addresses the problem of automating the synthesis of clock-gated clock spines with the objective of minimizing total clock power while meeting the clock skew and slew constraints. The key idea of our proposed synthesis algorithm is to identify and group the flip-flops with tight correlation of clock-gating operations together to form a spine while accurately predicting and maintaining clock skew and slew variations through the buffer insertion and stub allocation. In summary, this dissertation presents clock tuning techniques with consideration of post-silicon tuning, flexible flip-flop timing model, and clock-gated clock spine synthesis algorithm.Abstract i Chapter 1 INTRODUCTION 1 1.1 Clock Distribution Network . . . . . . . . . . . . . . . . . . . . . 1 1.2 Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Flexible Flip-flop Timing Model . . . . . . . . . . . . . . . . . . . 3 1.4 Clock Spine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Contributions of This Dissertation . . . . . . . . . . . . . . . . . 6 Chapter 2 POST-SILICON TUNABLE CLOCK BUFFER ALLOCATION BASED ON FAST CHIP YIELD COMPUTATION 8 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Systematic Exploration of PST Buffer Allocation . . . . . . . . . 10 2.2.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 15 2.2.3 Allocation Algorithm . . . . . . . . . . . . . . . . . . . . . 16 2.3 Fast Timing Yield Computation . . . . . . . . . . . . . . . . . . 17 2.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Incremental Yield Computation . . . . . . . . . . . . . . . 22 2.4 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 PST Buffer Configuration Techniques . . . . . . . . . . . . . . . 31 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Chapter 3 POST-SILICON TUNING BASED ON FLEXIBLE FLIP-FLOP TIMING 34 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Preliminary and Definitions . . . . . . . . . . . . . . . . . . . . . 40 3.2.1 Flexible Flip-Flop Timing Model . . . . . . . . . . . . . . 40 3.2.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Motivational Examples . . . . . . . . . . . . . . . . . . . . . . . . 42 3.4 Clock Skew Scheduling for Slack Relaxation Based on Flexible Flip-Flop Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4.1 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4.2 Finding Local Clock Skew Schedule . . . . . . . . . . . . 48 3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Chapter 4 SYNTHESIS FOR POWER-AWARE CLOCK SPINES 61 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Preliminaries and Motivation . . . . . . . . . . . . . . . . . . . . 64 4.2.1 Clock Spine . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.2 Activity Patterns . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.3 Power Computation . . . . . . . . . . . . . . . . . . . . . 67 4.3 Algorithm for Clock Spine Synthesis . . . . . . . . . . . . . . . . 68 4.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 68 4.3.2 Power-Aware Sink Clustering . . . . . . . . . . . . . . . . 70 4.3.3 Spine Relaxation . . . . . . . . . . . . . . . . . . . . . . . 77 4.3.4 Spine Buffer Allocation . . . . . . . . . . . . . . . . . . . 80 4.3.5 Top-Level Tree Construction . . . . . . . . . . . . . . . . 86 4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Chapter 5 CONCLUSION 95 5.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Bibliography 97 ์ดˆ๋ก 106Docto

    Physical design of USB1.1

    Get PDF
    In earlier days, interfacing peripheral devices to host computer has a big problematic. There existed so many different kindsโ€™ ports like serial port, parallel port, PS/2 etc. And their use restricts many situations, Such as no hot-pluggability and involuntary configuration. There are very less number of methods to connect the peripheral devices to host computer. The main reason that Universal Serial Bus was implemented to provide an additional benefits compared to earlier interfacing ports. USB is designed to allow many peripheral be connecting using single standardize interface. It provides an expandable fast, cost effective, hot-pluggable plug and play serial hardware interface that makes life of computer user easier allowing them to plug different devices to into USB port and have them configured automatically. In this thesis demonstrated the USB v1.1 architecture part in briefly and generated gate level net list form RTL code by applying the different constraints like timing, area and power. By applying the various types design constraints so that the performance was improved by 30%. And then it implemented in physically by using SoC encounter EDI system, estimation of chip size, power analysis and routing the clock signal to all flip-flops presented in the design. To reduce the clock switching power implemented register clustering algorithm (DBSCAN). In this design implementation TSMC 180nm technology library is used

    Variation and power issues in VLSI clock networks

    Get PDF
    Clock Distribution Network (CDN) is an important component of any synchronous logic circuit. The function of CDN is to deliver the clock signal to the clock sinks. Clock skew is defined as the difference in the arrival time of the clock signal at the clock sinks. Higher uncertainty in skew (due to PVT variations) degrades circuit performance by decreasing the maximum possible delay between any two sequential elements. Aggressive frequency scaling has also led to high power consumption especially in CDN. This dissertation addresses variation and power issues in the design of current and potential future CDN. The research detailed in this work presents algorithmic techniques for the following problems: (1) Variation tolerance in useful skew design, (2) Link insertion for buffered clock nets, (3) Methodology and algorithms for rotary clocking and (4) Clock mesh optimization for skew-power trade off. For clock trees this dissertation presents techniques to integrate the different aspects of clock tree synthesis (skew scheduling, abstract topology and layout embedding) into one framework- tolerance to variations. This research addresses the issues involved in inserting cross-links in a buffered clock tree and proposes design criteria to avoid the risk of short-circuit current. Rotary clocking is a promising new clocking scheme that consists of unterminated rings formed by differential transmission lines. Rotary clocking achieves reduction in power dissipation clock skew. This dissertation addresses the issues in adopting current CAD methodology to rotary clocks. Alternative methodology and corresponding algorithmic techniques are detailed. Clock mesh is a popular form of CDN used in high performance systems. The problem of simultaneous sizing and placement of mesh buffers in a clock mesh is addressed. The algorithms presented remove the edges from the clock mesh to trade off skew tolerance for low power. For clock trees as well as link insertion, our experiments indicate significant reduction in clock skew due to variations. For clock mesh, experimental results indicate 18.5% reduction in power with 1.3% delay penalty on a average. In summary, this dissertation details methodologies/algorithms that address two critical issues- variation and power dissipation in current and potential future CDN

    Design methodology and productivity improvement in high speed VLSI circuits

    Get PDF
    2017 Spring.Includes bibliographical references.To view the abstract, please see the full text of the document

    Enhancing Power Efficient Design Techniques in Deep Submicron Era

    Get PDF
    Excessive power dissipation has been one of the major bottlenecks for design and manufacture in the past couple of decades. Power efficient design has become more and more challenging when technology scales down to the deep submicron era that features the dominance of leakage, the manufacture variation, the on-chip temperature variation and higher reliability requirements, among others. Most of the computer aided design (CAD) tools and algorithms currently used in industry were developed in the pre deep submicron era and did not consider the new features explicitly and adequately. Recent research advances in deep submicron design, such as the mechanisms of leakage, the source and characterization of manufacture variation, the cause and models of on-chip temperature variation, provide us the opportunity to incorporate these important issues in power efficient design. We explore this opportunity in this dissertation by demonstrating that significant power reduction can be achieved with only minor modification to the existing CAD tools and algorithms. First, we consider peak current, which has become critical for circuit's reliability in deep submicron design. Traditional low power design techniques focus on the reduction of average power. We propose to reduce peak current while keeping the overhead on average power as small as possible. Second, dual Vt technique and gate sizing have been used simultaneously for leakage savings. However, this approach becomes less effective in deep submicron design. We propose to use the newly developed process-induced mechanical stress to enhance its performance. Finally, in deep submicron design, the impact of on-chip temperature variation on leakage and performance becomes more and more significant. We propose a temperature-aware dual Vt approach to alleviate hot spots and achieve further leakage reduction. We also consider this leakage-temperature dependency in the dynamic voltage scaling approach and discover that a commonly accepted result is incorrect for the current technology. We conduct extensive experiments with popular design benchmarks, using the latest industry CAD tools and design libraries. The results show that our proposed enhancements are promising in power saving and are practical to solve the low power design challenges in deep submicron era

    High performance IC clock networks with grid and tree topologies

    Get PDF
    In this dissertation, an essential step in the integrated circuit (IC) physical design flowโ€”the clock network designโ€”is investigated. Clock network design entailsa series of computationally intensive, large-scale design and optimization tasks for the generation and distribution of the clock signal through different topologies. The lack or inefficacy of the automation for implementing high performance clock networks, especially for low-power, high speed and variation-aware implementations, is the main driver for this research. The synthesis and optimization methods for the two most commonly used clock topologies in IC designโ€”the grid topology and the tree topologyโ€”are primarily investigated.The clock mesh network, which uses the grid topology, has very low skew variation at the cost of high power dissipation. Two novel clock mesh network designmethodologies are proposed in this dissertation in order to reduce the power dissipation. These are the first methods known in literature that combine clock meshsynthesis with incremental register placement and clock gating for power saving purposes. The application of the proposed automation methods on the emerging resonant rotary clocking technology, which also has the grid topology, is investigated in this dissertation as well.The clock tree topology has the advantage of lower power dissipation compared to other traditional clock topologies (e.g. clock mesh, clock spine, clock tree with cross links) at the cost of increased performance degradation due to on-chip variations. A novel clock tree buffer polarity assignment flow is proposed in this dissertation in order to reduce these effects of on-chip variations on the clock tree topology. The proposed polarity assignment flow is the first work that introduces post-silicon, dynamic reconfigurability for polarity assignment, enabling clock gating for low power operation of the variation-tolerant clock tree networks.Ph.D., Electrical Engineering -- Drexel University, 201
    • โ€ฆ
    corecore