567 research outputs found

    Algorithmic techniques for nanometer VLSI design and manufacturing closure

    Get PDF
    As Very Large Scale Integration (VLSI) technology moves to the nanoscale regime, design and manufacturing closure becomes very difficult to achieve due to increasing chip and power density. Imperfections due to process, voltage and temperature variations aggravate the problem. Uncertainty in electrical characteristic of individual device and wire may cause significant performance deviations or even functional failures. These impose tremendous challenges to the continuation of Moore's law as well as the growth of semiconductor industry. Efforts are needed in both deterministic design stage and variation-aware design stage. This research proposes various innovative algorithms to address both stages for obtaining a design with high frequency, low power and high robustness. For deterministic optimizations, new buffer insertion and gate sizing techniques are proposed. For variation-aware optimizations, new lithography-driven and post-silicon tuning-driven design techniques are proposed. For buffer insertion, a new slew buffering formulation is presented and is proved to be NP-hard. Despite this, a highly efficient algorithm which runs > 90x faster than the best alternatives is proposed. The algorithm is also extended to handle continuous buffer locations and blockages. For gate sizing, a new algorithm is proposed to handle discrete gate library in contrast to unrealistic continuous gate library assumed by most existing algorithms. Our approach is a continuous solution guided dynamic programming approach, which integrates the high solution quality of dynamic programming with the short runtime of rounding continuous solution. For lithography-driven optimization, the problem of cell placement considering manufacturability is studied. Three algorithms are proposed to handle cell flipping and relocation. They are based on dynamic programming and graph theoretic approaches, and can provide different tradeoff between variation reduction and wire- length increase. For post-silicon tuning-driven optimization, the problem of unified adaptivity optimization on logical and clock signal tuning is studied, which enables us to significantly save resources. The new algorithm is based on a novel linear programming formulation which is solved by an advanced robust linear programming technique. The continuous solution is then discretized using binary search accelerated dynamic programming, batch based optimization, and Latin Hypercube sampling based fast simulation

    A Review of Bayesian Methods in Electronic Design Automation

    Full text link
    The utilization of Bayesian methods has been widely acknowledged as a viable solution for tackling various challenges in electronic integrated circuit (IC) design under stochastic process variation, including circuit performance modeling, yield/failure rate estimation, and circuit optimization. As the post-Moore era brings about new technologies (such as silicon photonics and quantum circuits), many of the associated issues there are similar to those encountered in electronic IC design and can be addressed using Bayesian methods. Motivated by this observation, we present a comprehensive review of Bayesian methods in electronic design automation (EDA). By doing so, we hope to equip researchers and designers with the ability to apply Bayesian methods in solving stochastic problems in electronic circuits and beyond.Comment: 24 pages, a draft version. We welcome comments and feedback, which can be sent to [email protected]

    ์ดˆ๋ฏธ์„ธ ํšŒ๋กœ ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ์ธํ„ฐ์ปค๋„ฅํŠธ์˜ ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ์˜ˆ์ธก

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ๊น€ํƒœํ™˜.ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ์ œ๊ฑฐ๋Š” ๋ฐ˜๋„์ฒด ์นฉ ์ œ์กฐ๋ฅผ ์œ„ํ•œ ๋งˆ์Šคํฌ ์ œ์ž‘ ์ „์— ์™„๋ฃŒ๋˜์–ด์•ผ ํ•  ํ•„์ˆ˜ ๊ณผ์ •์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํŠธ๋žœ์ง€์Šคํ„ฐ์™€ ์ธํ„ฐ์ปค๋„ฅํŠธ์˜ ๋ณ€์ด๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๊ณ  ๋””์ž์ธ ๋ฃฐ ์—ญ์‹œ ๋ณต์žกํ•ด์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ์ œ๊ฑฐ๋Š” ์ดˆ๋ฏธ์„ธ ํšŒ๋กœ์—์„œ ๋” ์–ด๋ ค์›Œ์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ดˆ๋ฏธ์„ธ ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ๋‘๊ฐ€์ง€ ๋ฌธ์ œ์ธ ํƒ€์ด๋ฐ ๋ถ„์„๊ณผ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์— ๋Œ€ํ•ด ๋‹ค๋ฃฌ๋‹ค. ์ฒซ๋ฒˆ์งธ๋กœ ๊ณต์ • ์ฝ”๋„ˆ์—์„œ ํƒ€์ด๋ฐ ๋ถ„์„์€ ์‹ค๋ฆฌ์ฝ˜์œผ๋กœ ์ œ์ž‘๋œ ํšŒ๋กœ์˜ ์„ฑ๋Šฅ์„ ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•˜์ง€ ๋ชปํ•œ๋‹ค. ๊ทธ ์ด์œ ๋Š” ๊ณต์ • ์ฝ”๋„ˆ์—์„œ ๊ฐ€์žฅ ๋Š๋ฆฐ ํƒ€์ด๋ฐ ๊ฒฝ๋กœ๊ฐ€ ๋ชจ๋“  ๊ณต์ • ์กฐ๊ฑด์—์„œ๋„ ๊ฐ€์žฅ ๋Š๋ฆฐ ๊ฒƒ์€ ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ์นฉ ๋‚ด์˜ ์ž„๊ณ„ ๊ฒฝ๋กœ์—์„œ ์ธํ„ฐ์ปค๋„ฅํŠธ์— ์˜ํ•œ ์ง€์—ฐ ์‹œ๊ฐ„์ด ์ „์ฒด ์ง€์—ฐ ์‹œ๊ฐ„์—์„œ์˜ ์˜ํ–ฅ์ด ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๊ณ , 10๋‚˜๋…ธ ์ดํ•˜ ๊ณต์ •์—์„œ๋Š” 20%๋ฅผ ์ดˆ๊ณผํ•˜๊ณ  ์žˆ๋‹ค. ์ฆ‰, ์‹ค๋ฆฌ์ฝ˜์œผ๋กœ ์ œ์ž‘๋œ ํšŒ๋กœ์˜ ์„ฑ๋Šฅ์„ ์ •ํ™•ํžˆ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋Œ€ํ‘œ ํšŒ๋กœ๊ฐ€ ํŠธ๋žœ์ง€์Šคํ„ฐ์˜ ๋ณ€์ด ๋ฟ๋งŒ์•„๋‹ˆ๋ผ ์ธํ„ฐ์ปค๋„ฅํŠธ์˜ ๋ณ€์ด๋„ ๋ฐ˜์˜ํ•ด์•ผํ•œ๋‹ค. ์ธํ„ฐ์ปค๋„ฅํŠธ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๊ธˆ์†์ด 10์ธต ์ด์ƒ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๊ณ , ๊ฐ ์ธต์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ธˆ์†์˜ ์ €ํ•ญ๊ณผ ์บํŒจ์‹œํ„ด์Šค์™€ ๋น„์•„ ์ €ํ•ญ์ด ๋ชจ๋‘ ํšŒ๋กœ ์ง€์—ฐ ์‹œ๊ฐ„์— ์˜ํ–ฅ์„ ์ฃผ๊ธฐ ๋•Œ๋ฌธ์— ๋Œ€ํ‘œ ํšŒ๋กœ๋ฅผ ์ฐพ๋Š” ๋ฌธ์ œ๋Š” ์ฐจ์›์ด ๋งค์šฐ ๋†’์€ ์˜์—ญ์—์„œ ์ตœ์ ์˜ ํ•ด๋ฅผ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ธํ„ฐ์ปค๋„ฅํŠธ๋ฅผ ์ œ์ž‘ํ•˜๋Š” ๊ณต์ •(๋ฐฑ ์—”๋“œ ์˜ค๋ธŒ ๋ผ์ธ)์˜ ๋ณ€์ด๋ฅผ ๋ฐ˜์˜ํ•œ ๋Œ€ํ‘œ ํšŒ๋กœ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ณต์ • ๋ณ€์ด๊ฐ€ ์—†์„๋•Œ ๊ฐ€์žฅ ๋Š๋ฆฐ ํƒ€์ด๋ฐ ๊ฒฝ๋กœ์— ์‚ฌ์šฉ๋œ ๊ฒŒ์ดํŠธ์™€ ๋ผ์šฐํŒ… ํŒจํ„ด์„ ๋ณ€๊ฒฝํ•˜๋ฉด์„œ ์ ์ง„์ ์œผ๋กœ ํƒ์ƒ‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ํ•ฉ์„ฑ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค์Œ์˜ ์ƒˆ๋กœ์šด ๊ธฐ์ˆ ๋“ค์„ ํ†ตํ•ฉํ•˜์˜€๋‹ค: (1) ๋ผ์šฐํŒ…์„ ๊ตฌ์„ฑํ•˜๋Š” ์—ฌ๋Ÿฌ ๊ธˆ์† ์ธต๊ณผ ๋น„์•„๋ฅผ ์ถ”์ถœํ•˜๊ณ  ํƒ์ƒ‰ ์‹œ๊ฐ„ ๊ฐ์†Œ๋ฅผ ์œ„ํ•ด ์œ ์‚ฌํ•œ ๊ตฌ์„ฑ๋“ค์„ ๊ฐ™์€ ๋ฒ”์ฃผ๋กœ ๋ถ„๋ฅ˜ํ•˜์˜€๋‹ค. (2) ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ํƒ€์ด๋ฐ ๋ถ„์„์„ ์œ„ํ•˜์—ฌ ์—ฌ๋Ÿฌ ๊ธˆ์† ์ธต๊ณผ ๋น„์•„๋“ค์˜ ๋ณ€์ด๋ฅผ ์ˆ˜์‹ํ™”ํ•˜์˜€๋‹ค. (3) ํ™•์žฅ์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ์ผ๋ฐ˜์ ์ธ ๋ง ์˜ค์‹ค๋ ˆ์ดํ„ฐ๋กœ ๋Œ€ํ‘œํšŒ๋กœ๋ฅผ ํƒ์ƒ‰ํ•˜์˜€๋‹ค. ๋‘๋ฒˆ์งธ๋กœ ๋””์ž์ธ ๋ฃฐ์˜ ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๊ณ , ์ด๋กœ ์ธํ•ด ํ‘œ์ค€ ์…€๋“ค์˜ ์ธํ„ฐ์ปค๋„ฅํŠธ๋ฅผ ํ†ตํ•œ ์—ฐ๊ฒฐ์„ ์ง„ํ–‰ํ•˜๋Š” ๋™์•ˆ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์ด ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ํ‘œ์ค€ ์…€์˜ ํฌ๊ธฐ๊ฐ€ ๊ณ„์† ์ž‘์•„์ง€๋ฉด์„œ ์…€๋“ค์˜ ์—ฐ๊ฒฐ์€ ์ ์  ์–ด๋ ค์›Œ์ง€๊ณ  ์žˆ๋‹ค. ๊ธฐ์กด์—๋Š” ํšŒ๋กœ ๋‚ด ๋ชจ๋“  ํ‘œ์ค€ ์…€์„ ์—ฐ๊ฒฐํ•˜๋Š”๋ฐ ํ•„์š”ํ•œ ํŠธ๋ž™ ์ˆ˜, ๊ฐ€๋Šฅํ•œ ํŠธ๋ž™ ์ˆ˜, ์ด๋“ค ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ ์—ฐ๊ฒฐ ๊ฐ€๋Šฅ์„ฑ์„ ํŒ๋‹จํ•˜๊ณ , ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š๋„๋ก ์…€ ๋ฐฐ์น˜๋ฅผ ์ตœ์ ํ™”ํ•˜์˜€๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธฐ์กด ๋ฐฉ๋ฒ•์€ ์ตœ์‹  ๊ณต์ •์—์„œ๋Š” ์ •ํ™•ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋” ๋งŽ์€ ์ •๋ณด๋ฅผ ์ด์šฉํ•œ ํšŒ๋กœ๋‚ด ๋ชจ๋“  ํ‘œ์ค€ ์…€ ์‚ฌ์ด์˜ ์—ฐ๊ฒฐ ๊ฐ€๋Šฅ์„ฑ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ๊ณ„ ํ•™์Šต์„ ํ†ตํ•ด ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์ด ๋ฐœ์ƒํ•˜๋Š” ์˜์—ญ ๋ฐ ๊ฐœ์ˆ˜๋ฅผ ์˜ˆ์ธกํ•˜๊ณ  ์ด๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ํ‘œ์ค€ ์…€์˜ ๋ฐฐ์น˜๋ฅผ ๋ฐ”๊พธ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ์˜์—ญ์€ ์ด์ง„ ๋ถ„๋ฅ˜๋กœ ์˜ˆ์ธกํ•˜์˜€๊ณ  ํ‘œ์ค€ ์…€์˜ ๋ฐฐ์น˜๋Š” ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ๊ฐœ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ตœ์ ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ์ œ์•ˆํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋‹ค์Œ์˜ ์„ธ๊ฐ€์ง€ ๊ธฐ์ˆ ๋กœ ๊ตฌ์„ฑ๋˜์—ˆ๋‹ค: (1) ํšŒ๋กœ ๋ ˆ์ด์•„์›ƒ์„ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ •์‚ฌ๊ฐํ˜• ๊ฒฉ์ž๋กœ ๋‚˜๋ˆ„๊ณ  ๊ฐ ๊ฒฉ์ž์—์„œ ๋ผ์šฐํŒ…์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋Š” ์š”์†Œ๋“ค์„ ์ถ”์ถœํ•œ๋‹ค. (2) ๊ฐ ๊ฒฉ์ž์—์„œ ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜์ด ์žˆ๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•˜๋Š” ์ด์ง„ ๋ถ„๋ฅ˜๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค. (3) ๋ฉ”ํƒ€ํœด๋ฆฌ์Šคํ‹ฑ ์ตœ์ ํ™” ๋˜๋Š” ๋ฒ ์ด์ง€์•ˆ ์ตœ์ ํ™”๋ฅผ ์ด์šฉํ•˜์—ฌ ์ „์ฒด ๋””์ž์ธ ๋ฃฐ ์œ„๋ฐ˜ ๊ฐœ์ˆ˜๊ฐ€ ๊ฐ์†Œํ•˜๋„๋ก ๊ฐ ๊ฒฉ์ž์— ์žˆ๋Š” ํ‘œ์ค€ ์…€์„ ์›€์ง์ธ๋‹ค.Timing analysis and clearing design rule violations are the essential steps for taping out a chip. However, they keep getting harder in deep sub-micron circuits because the variations of transistors and interconnects have been increasing and design rules have become more complex. This dissertation addresses two problems on timing analysis and design rule violations for synthesizing deep sub-micron circuits. Firstly, timing analysis in process corners can not capture post-Si performance accurately because the slowest path in the process corner is not always the slowest one in the post-Si instances. In addition, the proportion of interconnect delay in the critical path on a chip is increasing and becomes over 20% in sub-10nm technologies, which means in order to capture post-Si performance accurately, the representative critical path circuit should reflect not only FEOL (front-end-of-line) but also BEOL (backend-of-line) variations. Since the number of BEOL metal layers exceeds ten and the layers have variation on resistance and capacitance intermixed with resistance variation on vias between them, a very high dimensional design space exploration is necessary to synthesize a representative critical path circuit which is able to provide an accurate performance prediction. To cope with this, I propose a BEOL-aware methodology of synthesizing a representative critical path circuit, which is able to incrementally explore, starting from an initial path circuit on the post-Si target circuit, routing patterns (i.e., BEOL reconfiguring) as well as gate resizing on the path circuit. Precisely, the synthesis framework of critical path circuit integrates a set of novel techniques: (1) extracting and classifying BEOL configurations for lightening design space complexity, (2) formulating BEOL random variables for fast and accurate timing analysis, and (3) exploring alternative (ring oscillator) circuit structures for extending the applicability of this work. Secondly, the complexity of design rules has been increasing and results in more design rule violations during routing. In addition, the size of standard cell keeps decreasing and it makes routing harder. In the conventional P&R flow, the routability of pre-routed layout is predicted by routing congestion obtained from global routing, and then placement is optimized not to cause design rule violations. But it turned out to be inaccurate in advanced technology nodes so that it is necessary to predict routability with more features. I propose a methodology of predicting the hotspots of design rule violations (DRVs) using machine learning with placement related features and the conventional routing congestion, and perturbating placed cells to reduce the number of DRVs. Precisely, the hotspots are predicted by a pre-trained binary classification model and placement perturbation is performed by global optimization methods to minimize the number of DRVs predicted by a pre-trained regression model. To do this, the framework is composed of three techniques: (1) dividing the circuit layout into multiple rectangular grids and extracting features such as pin density, cell density, global routing results (demand, capacity and overflow), and more in the placement phase, (2) predicting if each grid has DRVs using a binary classification model, and (3) perturbating the placed standard cells in the hotspots to minimize the number of DRVs predicted by a regression model.1 Introduction 1 1.1 Representative Critical Path Circuit . . . . . . . . . . . . . . . . . . . 1 1.2 Prediction of Design Rule Violations and Placement Perturbation . . . 5 1.3 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . 7 2 Methodology for Synthesizing Representative Critical Path Circuits reflecting BEOL Timing Variation 9 2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Definitions and Overall Flow . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Techniques for BEOL-Aware RCP Generation . . . . . . . . . . . . . 17 2.3.1 Clustering BEOL Configurations . . . . . . . . . . . . . . . . 17 2.3.2 Formulating Statistical BEOL Random Variables . . . . . . . 18 2.3.3 Delay Modeling . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.4 Exploring Ring Oscillator Circuit Structures . . . . . . . . . . 24 2.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 Further Study on Variations . . . . . . . . . . . . . . . . . . . . . . . 37 3 Methodology for Reducing Routing Failures through Enhanced Prediction on Design Rule Violations in Placement 39 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Techniques for Reducing Routing Failures . . . . . . . . . . . . . . . 43 3.3.1 Binary Classification . . . . . . . . . . . . . . . . . . . . . . 43 3.3.2 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3.4 Placement Perturbation . . . . . . . . . . . . . . . . . . . . . 47 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.1 Experiments Setup . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.2 Hotspot Prediction . . . . . . . . . . . . . . . . . . . . . . . 51 3.4.3 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.4 Placement Perturbation . . . . . . . . . . . . . . . . . . . . . 57 4 Conclusions 61 4.1 Synthesis of Representative Critical Path Circuits reflecting BEOL Timing Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Reduction of Routing Failures through Enhanced Prediction on Design Rule Violations in Placement . . . . . . . . . . . . . . . . . . . . . . 62 Abstract (In Korean) 69Docto

    An Ultra-Low-Energy, Variation-Tolerant FPGA Architecture Using Component-Specific Mapping

    Get PDF
    As feature sizes scale toward atomic limits, parameter variation continues to increase, leading to increased margins in both delay and energy. Parameter variation both slows down devices and causes devices to fail. For applications that require high performance, the possibility of very slow devices on critical paths forces designers to reduce clock speed in order to meet timing. For an important and emerging class of applications that target energy-minimal operation at the cost of delay, the impact of variation-induced defects at very low voltages mandates the sizing up of transistors and operation at higher voltages to maintain functionality. With post-fabrication configurability, FPGAs have the opportunity to self-measure the impact of variation, determining the speed and functionality of each individual resource. Given that information, a delay-aware router can use slow devices on non-critical paths, fast devices on critical paths, and avoid known defects. By mapping each component individually and customizing designs to a component's unique physical characteristics, we demonstrate that we can eliminate delay margins and reduce energy margins caused by variation. To quantify the potential benefit we might gain from component-specific mapping, we first measure the margins associated with parameter variation, and then focus primarily on the energy benefits of FPGA delay-aware routing over a wide range of predictive technologies (45 nm--12 nm) for the Toronto20 benchmark set. We show that relative to delay-oblivious routing, delay-aware routing without any significant optimizations can reduce minimum energy/operation by 1.72x at 22 nm. We demonstrate how to construct an FPGA architecture specifically tailored to further increase the minimum energy savings of component-specific mapping by using the following techniques: power gating, gate sizing, interconnect sparing, and LUT remapping. With all optimizations considered we show a minimum energy/operation savings of 2.66x at 22 nm, or 1.68--2.95x when considered across 45--12 nm. As there are many challenges to measuring resource delays and mapping per chip, we discuss methods that may make component-specific mapping more practical. We demonstrate that a simpler, defect-aware routing achieves 70% of the energy savings of delay-aware routing. Finally, we show that without variation tolerance, scaling from 16 nm to 12 nm results in a net increase in minimum energy/operation; component-specific mapping, however, can extend minimum energy/operation scaling to 12 nm and possibly beyond.</p

    Design and Optimization for Resilient Energy Efficient Computing

    Get PDF
    Heutzutage sind moderne elektronische Systeme ein integraler Bestandteil unseres Alltags. Dies wurde unter anderem durch das exponentielle Wachstum der Integrationsdichte von integrierten Schaltkreisen ermรถglicht zusammen mit einer Verbesserung der Energieeffizienz, welche in den letzten 50 Jahren stattfand, auch bekannt als Mooreโ€˜s Gesetz. In diesem Zusammenhang ist die Nachfrage von energieeffizienten digitalen Schaltkreisen enorm angestiegen, besonders in Anwendungsfeldern wie dem Internet of Things (IoT). Da der Leistungsverbrauch von Schaltkreisen stark mit der Versorgungsspannung verknรผpft ist, wurden effiziente Verfahren entwickelt, welche die Versorgungsspannung in den nahen Schwellenspannung-Bereich skalieren, zusammengefasst unter dem Begriff Near-Threshold-Computing (NTC). Mithilfe dieser Verfahren kann eine Erhรถhung der Energieeffizienz von Schaltungen um eine ganze GrรถรŸenordnung ermรถglicht werden. Neben der verbesserten Energiebilanz ergeben sich jedoch zahlreiche Herausforderungen was den Schaltungsentwurf angeht. Zum Beispiel fรผhrt das Reduzieren der Versorgungsspannung in den nahen Schwellenspannungsbereich zu einer verzehnfachten Erhรถhung der Sensibilitรคt der Schaltkreise gegenรผber Prozessvariation, Spannungsfluktuationen und Temperaturverรคnderungen. Die Einflรผsse dieser Variationen reduzieren die Zuverlรคssigkeit von NTC Schaltkreisen und sind ihr grรถรŸtes Hindernis bezรผglich einer umfassenden Nutzung. Traditionelle Ansรคtze und Methoden aus dem nominalen Spannungsbereich zur Kompensation von Variabilitรคt kรถnnen nicht effizient angewandt werden, da die starken Performance-Variationen und Sensitivitรคten im nahen Schwellenspannungsbereich dessen Kapazitรคten รผbersteigen. Aus diesem Grund sind neue Entwurfsparadigmen und Entwurfsautomatisierungskonzepte fรผr die Anwendung von NTC erforderlich. Das Ziel dieser Arbeit ist die zuvor erwรคhnten Probleme durch die Bereitstellung von ganzheitlichen Methoden zum Design von NTC Schaltkreisen sowie dessen Entwurfsautomatisierung anzugehen, welche insbesondere auf der Schaltungs- sowie Logik-Ebene angewandt werden. Dabei werden tiefgehende Analysen der Zuverlรคssigkeit von NTC Systemen miteinbezogen und Optimierungsmethoden werden vorgeschlagen welche die Zuverlรคssigkeit, Performance und Energieeffizienz verbessern. Die Beitrรคge dieser Arbeit sind wie folgt: Schaltungssynthese und Timing Closure unter Einbezug von Variationen: Das Einhalten von Anforderungen an das zeitliche Verhalten und Zuverlรคssigkeit von NTC ist eine anspruchsvolle Aufgabe. Die Auswirkungen von Variabilitรคt kommen bei starken Performance-Schwankungen, welche zu teuren zeitlichen Sicherheitsmargen fรผhren, oder sich in Hold-Time VerstรถรŸen ausdrรผcken, verursacht durch funktionale Stรถrungen, zum Vorschein. Die konventionellen Ansรคtze beschrรคnken sich dabei alleine auf die Erhรถhung von zeitlichen Sicherheitsmargen. Dies ist jedoch sehr ineffizient fรผr NTC, wegen dem starken AusmaรŸ an Variationen und den erhรถhten Leckstrรถmen. In dieser Arbeit wird ein Konzept zur Synthese und Timing Closure von Schaltkreisen unter Variationen vorgestellt, welches sowohl die Sensitivitรคt gegenรผber Variationen reduziert als auch die Energieeffizienz, Performance und Zuverlรคssigkeit verbessert und zugleich den Mehraufwand von Timing Closures [1, 2] verringert. Simulationsergebnisse belegen, dass unser vorgeschlagener Ansatz die Verzรถgerungszeit um 87% reduziert und die Performance und Energieeffizienz um 25% beziehungsweise 7.4% verbessert, zu Kosten eines erhรถhten Flรคchenbedarfs von 4.8%. Schichtรผbergreifende Zuverlรคssigkeits-, Energieeffizienz- und Performance-Optimierung von Datenpfaden: Schichtรผbergreifende Analyse von Prozessor-Datenpfaden, welche den ganzen Weg spannen vom Kompilierer zum Schaltungsentwurf, kann potenzielle Optimierungsansรคtze aufzeigen. Ein Datenpfad ist eine Kombination von mehreren funktionalen Einheiten, welche diverse Instruktionen verarbeiten kรถnnen. Unsere Analyse zeigt, dass die Ausfรผhrungszeiten von Instruktionen bei niedrigen Versorgungsspannungen stark variieren, weshalb eine Klassifikation in schnelle und langsame Instruktionen vorgenommen werden kann. Des Weiteren kรถnnen funktionale Instruktionen als hรคufig und selten genutzte Instruktionen kategorisiert werden. Diese Arbeit stellt eine Multi-Zyklen-Instruktionen-Methode vor, welche die Energieeffizienz und Belastbarkeit von funktionalen Einheiten erhรถhen kann [3]. Zusรคtzlich stellen wir einen Partitionsalgorithmus vor, welcher ein fein-granulares Power-gating von selten genutzten Einheiten ermรถglicht [4] durch Partition von einzelnen funktionalen Einheiten in mehrere kleinere Einheiten. Die vorgeschlagenen Methoden verbessern das zeitliche Schaltungsverhalten signifikant, und begrenzen zugleich die Leckstrรถme betrรคchtlich, durch Einsatz einer Kombination von Schaltungs-Redesign- und Code-Replacement-Techniken. Simulationsresultate zeigen, dass die entwickelten Methoden die Performance und Energieeffizienz von arithmetisch-logischen Einheiten (ALU) um 19% beziehungsweise 43% verbessern. Des Weiteren kann der Zuwachs in Performance der optimierten Schaltungen in eine Verbesserung der Zuverlรคssigkeit umgewandelt werden [5, 6]. Post-Fabrication und Laufzeit-Tuning: Prozess- und Laufzeitvariationen haben einen starken Einfluss auf den Minimum Energy Point (MEP) von NTC-Schaltungen, welcher mit der energieeffizientesten Versorgungsspannung assoziiert ist. Es ist ein besonderes Anliegen, die NTC-Schaltung nach der Herstellung (post-fabrication) so zu kalibrieren, dass sich die Schaltung im MEP-Zustand befindet, um die beste Energieeffizient zu erreichen. In dieser Arbeit, werden Post-Fabrication und Laufzeit-Tuning vorgeschlagen, welche die Schaltung basierend auf Geschwindigkeits- und Leistungsverbrauch-Messungen nach der Herstellung auf den MEP kalibrieren. Die vorgestellten Techniken ermitteln den MEP per Chip-Basis um den Einfluss von Prozessvariationen mit einzubeziehen und dynamisch die Versorgungsspannung und Frequenz zu adaptieren um zeitabhรคngige Variationen wie Workload und Temperatur zu adressieren. Zu diesem Zweck wird in die Firmware eines Chips ein Regression-Modell integriert, welches den MEP basierend auf Workload- und Temperatur-Messungen zur Laufzeit extrahiert. Das Regressions-Modell ist fรผr jeden Chip einzigartig und basiert lediglich auf Post-Fabrication-Messungen. Simulationsergebnisse zeigen das der entwickelte Ansatz eine sehr hohe prognostische Treffsicherheit und Energieeffizienz hat, รคhnlich zu hardware-implementierten Methoden, jedoch ohne hardware-seitigen Mehraufwand [7, 8]. Selektierte Flip-Flop Optimierung: Ultra-Low-Voltage Schaltungen mรผssen im nominalen Versorgungsspannungs-Mode arbeiten um zeitliche Anforderungen von laufenden Anwendungen zu erfรผllen. In diesem Fall ist die Schaltung von starken Alterungsprozessen betroffen, welche die Transistoren durch Erhรถhung der Schwellenspannungen degradieren. Unsere tiefgehenden Analysen haben gezeigt das gewisse Flip-Flop-Architekturen von diesen Alterungserscheinungen beeinflusst werden indem fรคlschlicherweise konstante Werte ( \u270\u27 oder \u271\u27) fรผr eine lange Zeit gespeichert sind. Im Vergleich zu anderen Komponenten sind Flip-Flops sensitiver zu Alterungsprozessen und versagen unter anderem dabei einen neuen Wert innerhalb des vorgegebenen zeitlichen Rahmens zu รผbernehmen. AuรŸerdem kann auch ein geringfรผgiger Spannungsabfall zu diesen zeitlichen VerstรถรŸen fรผhren, falls die betreffenden gealterten Flip-Flops zum kritischen Pfad zuzuordnen sind. In dieser Arbeit wird eine selektiver Flip-Flop-Optimierungsmethode vorgestellt, welche die Schaltungen bezรผglich Robustheit gegen statische Alterung und Spannungsabfall optimieren. Dabei werden zuerst optimierte robuste Flip-Flops generiert und diese dann anschlieรŸend in die Standard-Zellen-Bibliotheken integriert. Flip-Flops, die in der Schaltung zum kritischen Pfad gehรถren und Alterung sowie Spannungsabfall erfahren, werden durch die optimierten robusten Versionen ersetzt, um das Zeitverhalten und die Zuverlรคssigkeit der Schaltung zu verbessern [9, 10]. Simulationsergebnisse zeigen, dass die erwartete Lebenszeit eines Prozessors um 37% verbessert werden kann, wรคhrend Leckstrรถme um nur 0.1% erhรถht werden. Wรคhrend NTC das Potenzial hat groรŸe Energieeffizienz zu ermรถglichen, ist der Einsatz in neue Anwendungsfeldern wie IoT wegen den zuvor erwรคhnten Problemen bezรผglich der hohen Sensitivitรคt gegenรผber Variationen und deshalb mangelnder Zuverlรคssigkeit, noch nicht durchsetzbar. In dieser Dissertation und in noch nicht publizierten Werken [11โ€“17], stellen wir Lรถsungen zu diesen Problemen vor, die eine Integration von NTC in heutige Systeme ermรถglichen

    Resource Management Algorithms for Computing Hardware Design and Operations: From Circuits to Systems

    Get PDF
    The complexity of computation hardware has increased at an unprecedented rate for the last few decades. On the computer chip level, we have entered the era of multi/many-core processors made of billions of transistors. With transistor budget of this scale, many functions are integrated into a single chip. As such, chips today consist of many heterogeneous cores with intensive interaction among these cores. On the circuit level, with the end of Dennard scaling, continuously shrinking process technology has imposed a grand challenge on power density. The variation of circuit further exacerbated the problem by consuming a substantial time margin. On the system level, the rise of Warehouse Scale Computers and Data Centers have put resource management into new perspective. The ability of dynamically provision computation resource in these gigantic systems is crucial to their performance. In this thesis, three different resource management algorithms are discussed. The first algorithm assigns adaptivity resource to circuit blocks with a constraint on the overhead. The adaptivity improves resilience of the circuit to variation in a cost-effective way. The second algorithm manages the link bandwidth resource in application specific Networks-on-Chip. Quality-of-Service is guaranteed for time-critical traffic in the algorithm with an emphasis on power. The third algorithm manages the computation resource of the data center with precaution on the ill states of the system. Q-learning is employed to meet the dynamic nature of the system and Linear Temporal Logic is leveraged as a tool to describe temporal constraints. All three algorithms are evaluated by various experiments. The experimental results are compared to several previous work and show the advantage of our methods

    Adaptive Integrated Circuit Design for Variation Resilience and Security

    Get PDF
    The past few decades witness the burgeoning development of integrated circuit in terms of process technology scaling. Along with the tremendous benefits coming from the scaling, challenges are also presented in various stages. During the design time, the complexity of developing a circuit with millions to billions of smaller size transistors is extended after the variations are taken into account. The difficulty of analyzing these nondeterministic properties makes the allocation scheme of redundant resource hardly work in a cost-efficient way. Besides fabrication variations, analog circuits are suffered from severe performance degradations owing to their physical attributes which are vulnerable to aging effects. As such, the post-silicon calibration approach gains increasing attentions to compensate the performance mismatch. For the user-end applications, additional system failures result from the pirated and counterfeited devices provided by the untrusted semiconductor supply chain. Again analog circuits show their weakness to this threat due to the shortage of piracy avoidance techniques. In this dissertation, we propose three adaptive integrated circuit designs to overcome these challenges respectively. The first one investigates the variability-aware gate implementation with the consideration of the overhead control of adaptivity assignment. This design improves the variation resilience typically for digital circuits while optimizing the power consumption and timing yield. The second design is implemented as a self-validation system for the calibration of diverse analog circuits. The system is completely integrated on chip to enhance the convenience without external assistance. In the last design, a classic analog component is further studied to establish the configurable locking mechanism for analog circuits. The use of Satisfiability Modulo Theories addresses the difficulty of searching the unique unlocking pattern of non-Boolean variables

    Proximity Optimization for Adaptive Circuit Design

    Get PDF
    The performance growth of conventional VLSI circuits is seriously hampered by various variation effects and the fundamental limit of chip power density. Adaptive circuit design is recognized as a power-efficient approach to tackling the variation challenge. However, it tends to entail large area overhead if not carefully designed. This work studies how to reduce the overhead by forming adaptivity blocks considering both timing and physical proximity among logic cells. The proximity optimization consists of timing and location aware cell clustering and incremental placement enforcing the clusters. Experiments are performed on the ICCAD 2014 benchmark circuits, which include case of near one million cells. The experiment results prove that during clustering, location proximity among logic cells are equally important as the timing proximity among logic cells. Compared to alternative methods, our approach achieves 25% to 75% area overhead reduction with an average of 0:6% wirelength overhead, while retains about the same timing yield and power consumption

    AI/ML Algorithms and Applications in VLSI Design and Technology

    Full text link
    An evident challenge ahead for the integrated circuit (IC) industry in the nanometer regime is the investigation and development of methods that can reduce the design complexity ensuing from growing process variations and curtail the turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are largely manual; thus, time-consuming and resource-intensive. In contrast, the unique learning strategies of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex and data-intensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for understanding and processing the data within and across different abstraction levels via automated learning algorithms. It, in turn, improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews the AI/ML automated approaches introduced in the past towards VLSI design and manufacturing. Moreover, we discuss the scope of AI/ML applications in the future at various abstraction levels to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations
    • โ€ฆ
    corecore