213 research outputs found

    ASIC implemented MicroBlaze-based Coprocessor for Data Stream Management Systems

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)The drastic increase in Internet usage demands the need for processing data in real time with higher efficiency than ever before. Symbiote Coprocessor Unit (SCU), developed by Dr. Pranav Vaidya, is a hardware accelerator which has potential of providing data processing speedup of up to 150x compared with traditional data stream processors. However, SCU implementation is very complex, fixed, and uses an outdated host interface, which limits future improvement. Mr. Tareq S. Alqaisi, an MSECE graduate from IUPUI worked on curbing these limitations. In his architecture, he used a Xilinx MicroBlaze microcontroller to reduce the complexity of SCU along with few other modifications. The objective of this study is to make SCU suitable for mass production while reducing its power consumption and delay. To accomplish this, the execution unit of SCU has been implemented in application specific integrated circuit and modules such as ACG/OCG, sequential comparator, and D-word multiplier/divider are integrated into the design. Furthermore, techniques such as operand isolation, buffer insertion, cell swapping, and cell resizing are also integrated into the system. As a result, the new design attains 67.9435 ยตW of dynamic power as compared to 74.0012 ยตW before power optimization along with a small increase in static power, 39.47 ns of clock period as opposed to 52.26 ns before time optimization

    Crosstalk-driven interconnect optimization by simultaneous gate and wire sizing

    Full text link

    ์ดˆ๋ฏธ์„ธ ํšŒ๋กœ ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ์ธํ„ฐ์ปค๋„ฅํŠธ์˜ ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ์˜ˆ์ธก

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ๊น€ํƒœํ™˜.ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ์ œ๊ฑฐ๋Š” ๋ฐ˜๋„์ฒด ์นฉ ์ œ์กฐ๋ฅผ ์œ„ํ•œ ๋งˆ์Šคํฌ ์ œ์ž‘ ์ „์— ์™„๋ฃŒ๋˜์–ด์•ผ ํ•  ํ•„์ˆ˜ ๊ณผ์ •์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํŠธ๋žœ์ง€์Šคํ„ฐ์™€ ์ธํ„ฐ์ปค๋„ฅํŠธ์˜ ๋ณ€์ด๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๊ณ  ๋””์ž์ธ ๋ฃฐ ์—ญ์‹œ ๋ณต์žกํ•ด์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ์ œ๊ฑฐ๋Š” ์ดˆ๋ฏธ์„ธ ํšŒ๋กœ์—์„œ ๋” ์–ด๋ ค์›Œ์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ดˆ๋ฏธ์„ธ ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ๋‘๊ฐ€์ง€ ๋ฌธ์ œ์ธ ํƒ€์ด๋ฐ ๋ถ„์„๊ณผ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์— ๋Œ€ํ•ด ๋‹ค๋ฃฌ๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ ๊ณต์ • ์ฝ”๋„ˆ์—์„œ ํƒ€์ด๋ฐ ๋ถ„์„์€ ์‹ค๋ฆฌ์ฝ˜์œผ๋กœ ์ œ์ž‘๋œ ํšŒ๋กœ์˜ ์„ฑ๋Šฅ์„ ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•˜์ง€ ๋ชปํ•œ๋‹ค. ๊ทธ ์ด์œ ๋Š” ๊ณต์ • ์ฝ”๋„ˆ์—์„œ ๊ฐ€์žฅ ๋Š๋ฆฐ ํƒ€์ด๋ฐ ๊ฒฝ๋กœ๊ฐ€ ๋ชจ๋“  ๊ณต์ • ์กฐ๊ฑด์—์„œ๋„ ๊ฐ€์žฅ ๋Š๋ฆฐ ๊ฒƒ์€ ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ์นฉ ๋‚ด์˜ ์ž„๊ณ„ ๊ฒฝ๋กœ์—์„œ ์ธํ„ฐ์ปค๋„ฅํŠธ์— ์˜ํ•œ ์ง€์—ฐ ์‹œ๊ฐ„์ด ์ „์ฒด ์ง€์—ฐ ์‹œ๊ฐ„์—์„œ์˜ ์˜ํ–ฅ์ด ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๊ณ , 10๋‚˜๋…ธ ์ดํ•˜ ๊ณต์ •์—์„œ๋Š” 20%๋ฅผ ์ดˆ๊ณผํ•˜๊ณ  ์žˆ๋‹ค. ์ฆ‰, ์‹ค๋ฆฌ์ฝ˜์œผ๋กœ ์ œ์ž‘๋œ ํšŒ๋กœ์˜ ์„ฑ๋Šฅ์„ ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋Œ€ํ‘œ ํšŒ๋กœ๊ฐ€ ํŠธ๋žœ์ง€์Šคํ„ฐ์˜ ๋ณ€์ด ๋ฟ๋งŒ์•„๋‹ˆ๋ผ ์ธํ„ฐ์ปค๋„ฅํŠธ์˜ ๋ณ€์ด๋„ ๋ฐ˜์˜ํ•ด์•ผํ•œ๋‹ค. ์ธํ„ฐ์ปค๋„ฅํŠธ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๊ธˆ์†์ด 10์ธต ์ด์ƒ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๊ณ , ๊ฐ ์ธต์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ธˆ์†์˜ ์ €ํ•ญ๊ณผ ์บํŒจ์‹œํ„ด์Šค์™€ ๋น„์•„ ์ €ํ•ญ์ด ๋ชจ๋‘ ํšŒ๋กœ ์ง€์—ฐ ์‹œ๊ฐ„์— ์˜ํ–ฅ์„ ์ฃผ๊ธฐ ๋•Œ๋ฌธ์— ๋Œ€ํ‘œ ํšŒ๋กœ๋ฅผ ์ฐพ๋Š” ๋ฌธ์ œ๋Š” ์ฐจ์›์ด ๋งค์šฐ ๋†’์€ ์˜์—ญ์—์„œ ์ตœ์ ์˜ ํ•ด๋ฅผ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ธํ„ฐ์ปค๋„ฅํŠธ๋ฅผ ์ œ์ž‘ํ•˜๋Š” ๊ณต์ •(๋ฐฑ ์—”๋“œ ์˜ค๋ธŒ ๋ผ์ธ)์˜ ๋ณ€์ด๋ฅผ ๋ฐ˜์˜ํ•œ ๋Œ€ํ‘œ ํšŒ๋กœ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ณต์ • ๋ณ€์ด๊ฐ€ ์—†์„๋•Œ ๊ฐ€์žฅ ๋Š๋ฆฐ ํƒ€์ด๋ฐ ๊ฒฝ๋กœ์— ์‚ฌ์šฉ๋œ ๊ฒŒ์ดํŠธ์™€ ๋ผ์šฐํŒ… ํŒจํ„ด์„ ๋ณ€๊ฒฝํ•˜๋ฉด์„œ ์ ์ง„์ ์œผ๋กœ ํƒ์ƒ‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ํ•ฉ์„ฑ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค์Œ์˜ ์ƒˆ๋กœ์šด ๊ธฐ์ˆ ๋“ค์„ ํ†ตํ•ฉํ•˜์˜€๋‹ค: (1) ๋ผ์šฐํŒ…์„ ๊ตฌ์„ฑํ•˜๋Š” ์—ฌ๋Ÿฌ ๊ธˆ์† ์ธต๊ณผ ๋น„์•„๋ฅผ ์ถ”์ถœํ•˜๊ณ  ํƒ์ƒ‰ ์‹œ๊ฐ„ ๊ฐ์†Œ๋ฅผ ์œ„ํ•ด ์œ ์‚ฌํ•œ ๊ตฌ์„ฑ๋“ค์„ ๊ฐ™์€ ๋ฒ”์ฃผ๋กœ ๋ถ„๋ฅ˜ํ•˜์˜€๋‹ค. (2) ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ํƒ€์ด๋ฐ ๋ถ„์„์„ ์œ„ํ•˜์—ฌ ์—ฌ๋Ÿฌ ๊ธˆ์† ์ธต๊ณผ ๋น„์•„๋“ค์˜ ๋ณ€์ด๋ฅผ ์ˆ˜์‹ํ™”ํ•˜์˜€๋‹ค. (3) ํ™•์žฅ์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ์ผ๋ฐ˜์ ์ธ ๋ง ์˜ค์‹ค๋ ˆ์ดํ„ฐ๋กœ ๋Œ€ํ‘œํšŒ๋กœ๋ฅผ ํƒ์ƒ‰ํ•˜์˜€๋‹ค. ๋‘๋ฒˆ์งธ๋กœ ๋””์ž์ธ ๋ฃฐ์˜ ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๊ณ , ์ด๋กœ ์ธํ•ด ํ‘œ์ค€ ์…€๋“ค์˜ ์ธํ„ฐ์ปค๋„ฅํŠธ๋ฅผ ํ†ตํ•œ ์—ฐ๊ฒฐ์„ ์ง„ํ–‰ํ•˜๋Š” ๋™์•ˆ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์ด ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ํ‘œ์ค€ ์…€์˜ ํฌ๊ธฐ๊ฐ€ ๊ณ„์† ์ž‘์•„์ง€๋ฉด์„œ ์…€๋“ค์˜ ์—ฐ๊ฒฐ์€ ์ ์  ์–ด๋ ค์›Œ์ง€๊ณ  ์žˆ๋‹ค. ๊ธฐ์กด์—๋Š” ํšŒ๋กœ ๋‚ด ๋ชจ๋“  ํ‘œ์ค€ ์…€์„ ์—ฐ๊ฒฐํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ํŠธ๋ž™ ์ˆ˜, ๊ฐ€๋Šฅํ•œ ํŠธ๋ž™ ์ˆ˜, ์ด๋“ค ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ์—ฐ๊ฒฐ ๊ฐ€๋Šฅ์„ฑ์„ ํŒ๋‹จํ•˜๊ณ , ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š๋„๋ก ์…€ ๋ฐฐ์น˜๋ฅผ ์ตœ์ ํ™”ํ•˜์˜€๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธฐ์กด ๋ฐฉ๋ฒ•์€ ์ตœ์‹  ๊ณต์ •์—์„œ๋Š” ์ •ํ™•ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋” ๋งŽ์€ ์ •๋ณด๋ฅผ ์ด์šฉํ•œ ํšŒ๋กœ๋‚ด ๋ชจ๋“  ํ‘œ์ค€ ์…€ ์‚ฌ์ด์˜ ์—ฐ๊ฒฐ ๊ฐ€๋Šฅ์„ฑ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ๊ณ„ ํ•™์Šต์„ ํ†ตํ•ด ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์ด ๋ฐœ์ƒํ•˜๋Š” ์˜์—ญ ๋ฐ ๊ฐœ์ˆ˜๋ฅผ ์˜ˆ์ธกํ•˜๊ณ  ์ด๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ํ‘œ์ค€ ์…€์˜ ๋ฐฐ์น˜๋ฅผ ๋ฐ”๊พธ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ์˜์—ญ์€ ์ด์ง„ ๋ถ„๋ฅ˜๋กœ ์˜ˆ์ธกํ•˜์˜€๊ณ  ํ‘œ์ค€ ์…€์˜ ๋ฐฐ์น˜๋Š” ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ๊ฐœ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ์ œ์•ˆํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค์Œ์˜ ์„ธ๊ฐ€์ง€ ๊ธฐ์ˆ ๋กœ ๊ตฌ์„ฑ๋˜์—ˆ๋‹ค: (1) ํšŒ๋กœ ๋ ˆ์ด์•„์›ƒ์„ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ •์‚ฌ๊ฐํ˜• ๊ฒฉ์ž๋กœ ๋‚˜๋ˆ„๊ณ  ๊ฐ ๊ฒฉ์ž์—์„œ ๋ผ์šฐํŒ…์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š” ์š”์†Œ๋“ค์„ ์ถ”์ถœํ•œ๋‹ค. (2) ๊ฐ ๊ฒฉ์ž์—์„œ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์ด ์žˆ๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•˜๋Š” ์ด์ง„ ๋ถ„๋ฅ˜๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. (3) ๋ฉ”ํƒ€ํœด๋ฆฌ์Šคํ‹ฑ ์ตœ์ ํ™” ๋˜๋Š” ๋ฒ ์ด์ง€์•ˆ ์ตœ์ ํ™”๋ฅผ ์ด์šฉํ•˜์—ฌ ์ „์ฒด ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ๊ฐœ์ˆ˜๊ฐ€ ๊ฐ์†Œํ•˜๋„๋ก ๊ฐ ๊ฒฉ์ž์— ์žˆ๋Š” ํ‘œ์ค€ ์…€์„ ์›€์ง์ธ๋‹ค.Timing analysis and clearing design rule violations are the essential steps for taping out a chip. However, they keep getting harder in deep sub-micron circuits because the variations of transistors and interconnects have been increasing and design rules have become more complex. This dissertation addresses two problems on timing analysis and design rule violations for synthesizing deep sub-micron circuits. Firstly, timing analysis in process corners can not capture post-Si performance accurately because the slowest path in the process corner is not always the slowest one in the post-Si instances. In addition, the proportion of interconnect delay in the critical path on a chip is increasing and becomes over 20% in sub-10nm technologies, which means in order to capture post-Si performance accurately, the representative critical path circuit should reflect not only FEOL (front-end-of-line) but also BEOL (backend-of-line) variations. Since the number of BEOL metal layers exceeds ten and the layers have variation on resistance and capacitance intermixed with resistance variation on vias between them, a very high dimensional design space exploration is necessary to synthesize a representative critical path circuit which is able to provide an accurate performance prediction. To cope with this, I propose a BEOL-aware methodology of synthesizing a representative critical path circuit, which is able to incrementally explore, starting from an initial path circuit on the post-Si target circuit, routing patterns (i.e., BEOL reconfiguring) as well as gate resizing on the path circuit. Precisely, the synthesis framework of critical path circuit integrates a set of novel techniques: (1) extracting and classifying BEOL configurations for lightening design space complexity, (2) formulating BEOL random variables for fast and accurate timing analysis, and (3) exploring alternative (ring oscillator) circuit structures for extending the applicability of this work. Secondly, the complexity of design rules has been increasing and results in more design rule violations during routing. In addition, the size of standard cell keeps decreasing and it makes routing harder. In the conventional P&R flow, the routability of pre-routed layout is predicted by routing congestion obtained from global routing, and then placement is optimized not to cause design rule violations. But it turned out to be inaccurate in advanced technology nodes so that it is necessary to predict routability with more features. I propose a methodology of predicting the hotspots of design rule violations (DRVs) using machine learning with placement related features and the conventional routing congestion, and perturbating placed cells to reduce the number of DRVs. Precisely, the hotspots are predicted by a pre-trained binary classification model and placement perturbation is performed by global optimization methods to minimize the number of DRVs predicted by a pre-trained regression model. To do this, the framework is composed of three techniques: (1) dividing the circuit layout into multiple rectangular grids and extracting features such as pin density, cell density, global routing results (demand, capacity and overflow), and more in the placement phase, (2) predicting if each grid has DRVs using a binary classification model, and (3) perturbating the placed standard cells in the hotspots to minimize the number of DRVs predicted by a regression model.1 Introduction 1 1.1 Representative Critical Path Circuit . . . . . . . . . . . . . . . . . . . 1 1.2 Prediction of Design Rule Violations and Placement Perturbation . . . 5 1.3 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . 7 2 Methodology for Synthesizing Representative Critical Path Circuits reflecting BEOL Timing Variation 9 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Definitions and Overall Flow . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Techniques for BEOL-Aware RCP Generation . . . . . . . . . . . . . 17 2.3.1 Clustering BEOL Configurations . . . . . . . . . . . . . . . . 17 2.3.2 Formulating Statistical BEOL Random Variables . . . . . . . 18 2.3.3 Delay Modeling . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.4 Exploring Ring Oscillator Circuit Structures . . . . . . . . . . 24 2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 Further Study on Variations . . . . . . . . . . . . . . . . . . . . . . . 37 3 Methodology for Reducing Routing Failures through Enhanced Prediction on Design Rule Violations in Placement 39 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Techniques for Reducing Routing Failures . . . . . . . . . . . . . . . 43 3.3.1 Binary Classification . . . . . . . . . . . . . . . . . . . . . . 43 3.3.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3.4 Placement Perturbation . . . . . . . . . . . . . . . . . . . . . 47 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.1 Experiments Setup . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.2 Hotspot Prediction . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.4 Placement Perturbation . . . . . . . . . . . . . . . . . . . . . 57 4 Conclusions 61 4.1 Synthesis of Representative Critical Path Circuits reflecting BEOL Timing Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Reduction of Routing Failures through Enhanced Prediction on Design Rule Violations in Placement . . . . . . . . . . . . . . . . . . . . . . 62 Abstract (In Korean) 69Docto

    Lagrangian relaxation-based multi-threaded discrete gate sizer

    Get PDF
    In integrated circuit design gate sizing is one of the key optimization techniques which is repeatedly invoked to trade-off delays for area and/or power of the gates during logic design and physical design stages. With increasing design sizes of a million gates and larger, discrete gate sizes and non-convex delay models the gate sizing algorithms that were designed for continuous sizes and convex delay models are slow and timing inaccurate. Of the several published discrete gate sizing algorithms, recent works have shown that Lagrangian relaxation based gate sizers have produced designs with the lowest power on average with high timing accuracy. But they are also very slow due to a large number of expensive timing updates spread across hundreds of iterations of solving the Lagrangian sub-problem. In this thesis we present a Lagrangian relaxation based multi-threaded discrete gate sizer for fast timing and power reduction by swapping the gate sizes and the threshold voltages. We developed two parallelization enabling techniques to reduce the runtime of Lagrangian sub-problem solver, namely, mutual exclusion edge (MEE) assignment and directed acyclic graph (DAG) based netlist traversal. MEEs are dummy edges assigned to reduce computational dependencies among gates sharing one or more common fan-ins. DAG based netlist traversal facilitates simultaneous resizing of gates belonging to different topological levels. We designed a Lagrange multiplier update framework that enables rapid convergence of the timing recovery and power recovery algorithms. To reduce the runtime of timing updates, we proposed a simple and fast-to-compute effective capacitance model and several mechanisms to calibrate the timing models to improve their accuracy. Compared to the state-of-the-art gate sizer, our proposed gate sizer is on average 15x faster and the optimized designs have only 1.7\% higher power. In digital synchronous designs simultaneous gate sizing and clock skew scheduling provides significantly more power saving. We extend the gate sizer to simultaneously schedule the clock skew. It can achieve an average of 18.8\% more reduction in power with only 20\% increase in the runtime

    Discrete Gate Sizing Methodologies for Delay, Area and Power Optimization

    Get PDF
    The modeling of an individual gate and the optimization of circuit performance has long been a critical issue in the VLSI industry. In this work, we first study of the gate sizing problem for today\u27s industrial designs, and explore the contributions and limitations of all the existing approaches, which mainly suffer from producing only continuous solutions, using outdated timing models or experiencing performance inefficiency. In this dissertation, we present our new discrete gate sizing technique which optimizes different aspects of circuit performance, including delay, area and power consumption. And our method is fast and efficient as it applies the local search instead of global exhaustive search during gate size selection process, which greatly reduces the search space and improves the computation complexity. In addition to that, it is also flexible with different timing models, and it is able to deal with the constraints of input/output slew and output load capacitance, under which very few previous research works were reported. We then propose a new timing model, which is derived from the classic Elmore delay model, but takes the features of modern timing models from standard cell library. With our new timing model, we are able to formulate the combinatorial discrete sizing problem as a simplified mathematical expression and apply it to existing Lagrangian relaxation method, which is shown to converge to optimal solution. We demonstrate that the classic Elmore delay model based gate sizing approaches can still be valid. Therefore, our work might provide a new look into the numerous Elmore delay model based research works in various areas (such as placement, routing, layout, buffer insertion, timing analysis, etc.)

    ๋ฉ”์‰ฌ ๊ธฐ๋ฐ˜์˜ ํด๋ฝ ๋„คํŠธ์›Œํฌ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2015. 2. ๊น€ํƒœํ™˜.The clock distribution network in a synchronous digital circuit delivers a clock signal to every storage element i.e., clock sink in the circuit. However, since the continued technology scaling increases PVT (process-voltage-temperature) variation, the increase of clock skew variation is highly likely to cause performance degradation or system failure at run time. Recently, to mitigate the clock skew variation, many researchers have taken a profound interest in the clock mesh network. However, though the structure of clock mesh network is excellent in tolerating timing variation, it demands significantly high power consumption due to the use of excessive mesh wire and buffer resources. Thus, optimizing the resources required in the mesh clock synthesis while maintaining the variation tolerance is crucially important. The three major tasks that greatly affect the cost of resulting clock mesh are (1) mesh segment allocation, (2) mesh buffer allocation and sizing, and (3) clock sink binding to mesh segments. Previous clock mesh optimization approaches solve the three tasks sequentially, one by one at a time, to manage the run time complexity of the tasks at the expense of losing the quality of results. However, since the three tasks are tightly inter-related, simultaneously optimizing all three tasks is essential, if the run time is ever permitted, to synthesize an economical clock mesh network. In this dissertation, we propose an approach which is able to tackle the problem in an integrated fashion by combining the three tasks into an iterative framework of incremental updates and solving them simultaneously to find a globally optimal allocation of mesh resources while taking into account the clock skew tolerance constraints. The core parts of this dissertation are a precise analysis on the relation among the resource optimization tasks and an establishment of mechanism for effective and efficient integration of the tasks. In particular, to handle the run time problem, we propose a set of speed-up techniques i.e., modeling RC circuit for eliminating redundant matrix multiplications, exploiting sliding window scheme, and fast buffer sizing effect estimation, which are fitted into our context of fast clock skew estimation in mesh resource optimization as well as an invention of early decision policies. In summary, this dissertation presents the efficient design methodology for clock mesh synthesis with consideration on integration of three tasks and reduction of runtime complexity.Abstract i Contents iii List of Figures vi List of Tables x 1 Introduction 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . 3 2 Background 5 2.1 Clock Distribution Network . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Clock Network Topologies . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Design Metrics of Clock Network . . . . . . . . . . . . . . . . . . . 7 2.4 The Effects of Variations on Clock Skew . . . . . . . . . . . . . . . . 9 3 Clock Mesh Synthesis Flow 12 3.1 Elements of Clock Mesh . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Conventional Clock Mesh Synthesis Overview . . . . . . . . . . . . . 13 3.3 Initial Grid Generation . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Mesh Buffer Placement and Sizing . . . . . . . . . . . . . . . . . . . 14 3.5 Clock Mesh Optimization . . . . . . . . . . . . . . . . . . . . . . . . 17 4 Integrated Resource Allocation and Binding in Clock Mesh Synthesis 19 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.3 Framework of Clock Mesh Optimization . . . . . . . . . . . . . . . . 26 4.3.1 Incremental Resource Updates . . . . . . . . . . . . . . . . . 29 4.3.2 Constraints for Variation Tolerance . . . . . . . . . . . . . . 34 4.3.3 Early Decision Policies . . . . . . . . . . . . . . . . . . . . . 38 4.3.4 Time Complexity Analysis . . . . . . . . . . . . . . . . . . . 39 4.4 Fast Clock Skew Estimation Techniques . . . . . . . . . . . . . . . . 40 4.4.1 Partially Reusing Matrix Multiplication for Incremental Updates 41 4.4.2 Adopting Sliding Window Scheme . . . . . . . . . . . . . . . 43 4.4.3 Adjusting Delay Caused by Buffer Resizing . . . . . . . . . . 44 4.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.5.1 Experimental Environments . . . . . . . . . . . . . . . . . . 46 4.5.2 Resource Requirement and Variation Tolerance Comparison . 48 4.5.3 Comparison with Clock Mesh Optimization using Worst Case Timing Analysis of Commercial Tool . . . . . . . . . . . . . 56 4.5.4 Analysis of the Effect of Proposed Techniques . . . . . . . . 58 4.5.5 Run Time Analysis . . . . . . . . . . . . . . . . . . . . . . . 61 4.5.6 Accuracy and Run Time of Fast Clock Skew Estimation . . . 63 4.5.7 Electromigration Analysis . . . . . . . . . . . . . . . . . . . 68 4.5.8 Run-time Analysis in Multi-thread Computing Environment . 70 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5 Conclusion 74 Abstract in Korean 84Docto

    Fuzzy simulated evolution algorithm for VLSI cell placement

    Get PDF
    Placement is a major step encountered during the design of very large scale integrated circuits. It is a generalization of the quadratic assignment problem with numerous constraints, several objectives, and a very noisy solution space. Besides the NP-hard nature of this problem, many circuit parameters such as area, interconnect delays, wire requirements, etc. can only be imprecisely estimated before completing the remaining design automation steps and committing the circuit to silicon. Further, the best placement is usually one that combines several desirable physical characteristics. There has not been a consensus on how to accommodate all these (conflicting) requirements in the search for near optimal feasible solutions. In this paper, we present a fuzzy simulated evolution (FSE) algorithm to tackle this problem. Identification of near optimal solutions is achieved through a novel goal-directed fuzzy search approach. This approach can be followed by other iterative (meta-) heuristics to find desirable solutions to optimization problems with noisy search space and possibly more than one objective. This approach is dominance preserving, i.e. if a solution A dominates another solution B with respect to all objective criteria, then A will surely have a higher membership in the fuzzy set of good solutions than solution B. Further, the approach scales well with larger problem instances and/or a larger number of objective criteria. Also, the operators of all stages of simulated evolution have been implemented using fuzzy logic to exploit the nature of fuzzy information of the problem domain. Experiments with benchmark tests demonstrate a noticeable improvement in solution quality. (C) 2002 Published by Elsevier Science Ltd

    Overcoming the challenges in very deep submicron for area reduction, power reduction and faster design closure

    Get PDF
    The project is aimed at understanding the existing very deep sub-micron (VDSM) implementation of a digital design, analyzing it from the point of view of power, area and timing and to come up with solutions and strategies to optimize the implementation in terms of power, area and timing. The effort involved, to understand the constraints, reasons and the requirements resulting in the existing implementation of the design. Further, various experiments were carried out to improve the design in various aspects like power, area and timing. The tradeoffs required and the benefits of each of the experiments were contrasted and analyzed. The optimum solutions and strategies which balance the requirements were tried out and published at the end of the report

    Simulated evolution for timing and low power VLSI standard cell placement

    Get PDF
    Abstract This paper presents a Fuzzy Simulated Evolution algorithm for VLSI standard cell placement with the objective of minimizing power, delay and area. For this hard multiobjective combinatorial optimization problem, no known exact and efficient algorithms exist that guarantee finding a solution of specific or desirable quality. Approximation iterative heuristics such as Simulated Evolution are best suited to perform an intelligent search of the solution space. Due to the imprecise nature of design information at the placement stage the various objectives and constraints are expressed in the fuzzy domain. The search is made to evolve toward a vector of fuzzy goals. Variants of the algorithm which include adaptive bias and biasless simulated evolution are proposed and experimental results are presented. Comparison with genetic algorithm is discussed. r 2003 Elsevier Ltd. All rights reserved
    • โ€ฆ
    corecore