494 research outputs found
EffiTest: Efficient Delay Test and Statistical Prediction for Configuring Post-silicon Tunable Buffers
At nanometer manufacturing technology nodes, process variations significantly
affect circuit performance. To combat them, post- silicon clock tuning buffers
can be deployed to balance timing bud- gets of critical paths for each
individual chip after manufacturing. The challenge of this method is that path
delays should be mea- sured for each chip to configure the tuning buffers
properly. Current methods for this delay measurement rely on path-wise
frequency stepping. This strategy, however, requires too much time from ex-
pensive testers. In this paper, we propose an efficient delay test framework
(EffiTest) to solve the post-silicon testing problem by aligning path delays
using the already-existing tuning buffers in the circuit. In addition, we only
test representative paths and the delays of other paths are estimated by
statistical delay prediction. Exper- imental results demonstrate that the
proposed method can reduce the number of frequency stepping iterations by more
than 94% with only a slight yield loss.Comment: ACM/IEEE Design Automation Conference (DAC), June 201
์ ์ ๋ ฅ ๊ณ ์ฑ๋ฅ ๋์งํธ ์์คํ ์ ์ํ ๊ณ ์ ๋ขฐ๋์ ํด๋ญ ๋คํธ์ํฌ ์ค๊ณ ๋ฐฉ๋ฒ๋ก
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ์ ๊ธฐยท์ปดํจํฐ๊ณตํ๋ถ, 2015. 8. ๊นํํ.์ค๋๋ ์ ํ๋ก ์ค๊ณ์์ ๊ณต์ ๋ณ์ด๊ฐ ํ๋ก ํด๋ญ์ ํ์ด๋ฐ์ ๋ณ์ด์ ๋ฏธ์น๋ ์ํฅ์ ๋งค์ฐ ์ปค์ง์ ๋ฐ๋ผ, ์ ํต์ ์ผ๋ก ์ฌ์ฉ๋๋ ํด๋ญ ํธ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ ํด๋ญ ๋คํธ์ํฌ๋ฅผ ์ฌ์ฉํ๋ ๊ฒ์ ํ๊ณ์ ๋ถ๋ชํ๊ฒ ๋์๊ณ , ์ด๋ฅผ ๊ทน๋ณตํ๊ธฐ ์ํ ์ฌ๋ฌ๊ฐ์ง ๊ธฐ์ ๋ค์ด ์ ์๋์๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ๋ณ์ด์ ๊ฐํ ํด๋ญ ๋คํธ์ํฌ๋ฅผ ์ค๊ณํ๊ธฐ ์ํด, ์ฐ๊ตฌ ๋ฐ ์ฌ์ฉ๋๊ณ ์๋ ์ธ ๊ฐ์ง ๊ธฐ์ ์ ๋ํด ์๊ฐํ๊ณ , ์ด๋ค์ ๊ฐ์ ํ ์ฐ๊ตฌ๋ค์ ์ ์ํ๋ค.
์ฒซ์งธ๋ก, ์ด ๋
ผ๋ฌธ์์๋ ํด๋ญ์ ํ์ด๋ฐ ๋ฌธ์ ๋ฅผ ํ๋ก ์ ์ ์ดํ ๋จ๊ณ์์ ์กฐ์ ํ ์ ์๋ ํฌ์คํธ ์ค๋ฆฌ์ฝ ์กฐ์ ํด๋ญ ๋ฒํผ๋ฅผ ๋ฐฐ์นํ๋ ๋ฌธ์ ์ ๋ํด ์์ ํ๋ค. ํฌ์คํธ ์ค๋ฆฌ์ฝ ์กฐ์ ๋ฒํผ๋ ํด๋ญ์ ์ง์ฐ์๊ฐ์ ํ๋ก๊ฐ ์ ์๋ ์ดํ์ ๋จ๊ณ์์ ์กฐ์ ํ ์ฌ ํด๋ญ์ ํ์ด๋ฐ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ ์ ์์ง๋ง, ๋ฒํผ ์์ฒด์ ํฌ๊ธฐ ๋๋ฌธ์ ์ต์ํ์ ๊ฐ์๋ง ๊ฐ์ฅ ํจ์จ์ ์ธ ์์น์ ๋ฐฐ์นํด์ผ ํ๋ ๋ฌธ์ ๊ฐ ์๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ์ด์ ์ ์ฐ๊ตฌ๊ฐ ํ๋ก์ ์์จ์ ๊ณ์ฐํ ๋ ์๊ฐ์ด ๋ง์ด ๊ฑธ๋ฆฌ๋ ๋ชฌํ
-์นด๋ฅผ๋ก ์๋ฎฌ๋ ์ด์
์ ์ฌ์ฉํ๊ธฐ ๋๋ฌธ์ ํ์ ๊ฐ๋ฅํ ํฌ์คํธ ์ค๋ฆฌ์ฝ ์กฐ์ ๋ฒํผ์ ๋ฐฐ์น๊ฐ ์ ํ๋๋ ๋ฌธ์ ๊ฐ ์์์ ์ง์ ํ ํ, ๊ธฐ์กด์ ์ ์๋์๋ ๊ทธ๋ํ ๊ธฐ๋ฐ ํ๋ก ์์จ ๊ณ์ฐ ๊ธฐ๋ฒ์ ์ฌ์ฉํ์ฌ ํจ์จ์ ์ธ ํฌ์คํธ ์ค๋ฆฌ์ฝ ์กฐ์ ๋ฒํผ ๋ฐฐ์น๋ฅผ ์ฐพ์ ์ ์๋ ์ ์ง์ ์ด๊ณ ์ฒด๊ณ์ ์ธ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค.
๋ค์์ ํด๋ญ ์์ฐจ ์ค์ผ์ฅด๋ง ๋ฐฉ๋ฒ์ ๋ํ ์ฐ๊ตฌ๋ฅผ ์์ ํ๋ค. ์ต๊ทผ์ ์ฐ๊ตฌ์์ ์ ์๋์๋, ํ๋ฆฝ-ํ๋กญ์ ํด๋ญ์์ ์ถ๋ ฅ๊น์ง์ ๋๋ ์ด๊ฐ ํด๋ญ์ ์ค๋น์๊ฐ๊ณผ ์ ์ง์๊ฐ์ ์์กดํ๋ค๋ ์ ์ฐํ ํ๋ฆฝ-ํ๋กญ ํ์ด๋ฐ ๋ชจ๋ธ ์ฐ๊ตฌ๋ ๊ธฐ์กด์ ํ๋ฆฝ-ํ๋กญ์ ํ์ด๋ฐ ํน์ฑ๋ค์ด ๊ณ ์ ๋ ๊ฐ์ด๋ผ๋ ๊ฐ์ ์ ๊ธฐ๋ฐํ ์ ์ ํ์ด๋ฐ ๋ถ์์ ์ ํ์ฑ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ ์ ์๋ ์ค์ํ ์ฐ๊ตฌ์ด๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ์๋ก์ด ๋ชจ๋ธ์ ๊ณ ๋ คํ์ฌ, ์ด์ ์ ๊ณ ์ ์ ์ธ ํ๋ฆฝ-ํ๋กญ ํ์ด๋ฐ ํน์ฑ ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก ์งํ๋์๋ ํด๋ญ ์์ฐจ ์ค์ผ์ฅด๋ง์ ์ต์ ํ ๋ฌธ์ ๋ฅผ ์ ์ฐํ ํ๋ฆฝ-ํ๋กญ ํ์ด๋ฐ ๋ชจ๋ธ์ ๊ณ ๋ คํ์ฌ ํด๊ฒฐํ์๋ค. ๋ณธ ์ฐ๊ตฌ์์๋ ์ฃผ์ด์ง ํ๋ก์ ์ค๋น์๊ฐ๊ณผ ์ ์ง์๊ฐ์ ์ฌ์ ์๊ฐ์ ๋ฐ๋ณต์ ์ด๊ณ ์ฒด๊ณ์ ์ผ๋ก ์ต๋ํํ์ฌ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ์๋ค.
๋ง์ง๋ง์ผ๋ก ํด๋ญ ์คํ์ธ ๋คํธ์ํฌ์ ํฉ์ฑ์ ์๋ํํ๋ ๋ฌธ์ ์ ๋ํด ์์ ํ๋ค. ์ ํต์ ์ธ ํด๋ญ ํธ๋ฆฌ ๊ตฌ์กฐ๊ฐ ๊ณต์ ๋ณ์ด ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ์ง ๋ชปํ๊ธฐ ๋๋ฌธ์ ํด๋ญ ๋ฉ์ฌ๋ฅผ ํฌํจํ๋ ๋ค์ํ ๋์์ ๊ตฌ์กฐ๊ฐ ์ ์๋์๋ค. ํด๋ญ ๋ฉ์ฌ์ ๊ฒฝ์ฐ ๊ณต์ ๋ณ์ด์ ์ํ ํด๋ญ ์์ฐจ๋ฅผ ์ค์ผ ์ ์์์ง๋ง ์ด๋ฅผ ์ํด ์์ด์ด๋ ๋ฒํผ ๋ฑ์ ์์์ ๋ง์ด ์๋ชจํ๋ ๋ฌธ์ ๋ฅผ ๊ฐ์ง๊ณ ์๋ค. ๋ ๊ตฌ์กฐ์ ์ค๊ฐ์ ๊ตฌ์กฐ์๋ ํด๋ญ ํธ๋ฆฌ์ ๋
ธ๋๋ฅผ ์ฐ๊ฒฐํ๋ ํฌ๋ก์ค ๋งํฌ๋ฅผ ์ฝ์
ํ๋ ๊ตฌ์กฐ์ ํด๋ญ ์คํ์ธ ๊ตฌ์กฐ๊ฐ ์๋ค. ํด๋ญ ํธ๋ฆฌ์ ์ ์ง์ ์ธ ์์ ์ ๊ฐํ์ฌ ๋ง๋๋ ํฌ๋ก์ค ๋งํฌ์ ๋ฌ๋ฆฌ, ํด๋ญ ์คํ์ธ ๊ตฌ์กฐ๋ ํธ๋ฆฌ๋ ์ดํ์ ์ ์๋ ๋ฉ์ฌ์๋ ์์ ํ ๋ณ๊ฐ์ ๊ตฌ์กฐ๋ก, ์ด๋ฅผ ํฉ์ฑํ๋ ๋ฐฉ๋ฒ๋ ๋งค์ฐ ๋ค๋ฅด๋ค. ๊ทธ๋ ๊ธฐ ๋๋ฌธ์ ํด๋ญ ์คํ์ธ์ ํฉ์ฑํ๋ ์๊ณ ๋ฆฌ์ฆ์ ํ์์ ์ด๋ผ๊ณ ํ ์ ์์ผ๋, ํฉ์ฑ ๋ฐฉ๋ฒ๋ก ์ด๋ ์ด๋ฅผ ์๋ํํ๋ ๋ฐฉ๋ฒ์ ๊ดํ ์ฐ๊ตฌ๋ ์์ง ์๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ์ฐ์ , ํด๋ญ-๊ฒ์ดํ
์ ์ง์ํ๋ ํด๋ญ ์คํ์ธ์ ์ฃผ์ด์ง ํด๋ญ ์์ฐจ ๋ฐ ํด๋ญ ์ฌ๋ฃจ ์กฐ๊ฑด์ ๋ง์กฑํ๋ฉด์ ์์ ๋ฐ ์ ๋ ฅ ์๋ชจ๋์ ์ต์ํํ๋ ๋ฌธ์ ์ ๋ํด ์์ ํ๋ค. ๊ทธ๋ฆฌ๊ณ , ํ๋ก์์ ์ฃผ์ด์ง ํ๋ฆฝ-ํ๋กญ๋ค์ ํด๋ญ-๊ฒ์ดํ
์กฐ๊ฑด์์์ ์ฐ๊ด์ฑ์ ๊ณ ๋ คํ๊ณ ์กฐ์งํํ์ฌ ํด๋ญ ์คํ์ธ์ ์ฝ์
ํ ํ, ํด๋ญ ์์ฐจ ๋ฐ ์ฌ๋ฃจ ์กฐ๊ฑด์ ๊ณ ๋ คํ์ฌ ๋ฒํผ๋ฅผ ์ฝ์
ํ๋ ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ๋ค.
์์ฝํ๋ฉด, ๋ณธ ๋
ผ๋ฌธ์์๋ ํด๋ญ์ ํ์ด๋ฐ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ํฌ์คํธ-์ค๋ฆฌ์ฝ ์กฐ์ ํด๋ญ ๋ฒํผ๋ฅผ ์ฌ์ฉํ๋ ํ
ํฌ๋๊ณผ ํด๋ญ ์์ฐจ ์ค์ผ์ฅด๋ง์ ์ ์ฐํ ํ๋ฆฝ-ํ๋กญ ํ์ด๋ฐ ๋ชจ๋ธ์์ ์ ์ฉํ๋ ํ
ํฌ๋์ ์ ์ํ๊ณ , ํด๋ญ์ ํ์ด๋ฐ ๋ฌธ์ ์ ์ ๋ ฅ ์๋ชจ ๋ฌธ์ ๋ฅผ ํ๋ฒ์ ํด๊ฒฐํ๊ธฐ ์ํ ์๋ก์ด ํด๋ญ ์คํ์ธ ๋คํธ์ํฌ๋ฅผ ํฉ์ฑํ๋ ์๋ํ ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ๋ค.As the process variation is dominating to cause the clock timing variation among chips to be much large, conventional clock tree based clock network is not able to guarantee the timing constraint of a digital system. To overcome the limitations of traditional clock design techniques, various techniques have been studied. This dissertation addresses three techniques that have been widely used for designing robust clock network and proposes developed methods.
First, it is widely accepted that post-silicon tunable (PST) clock buffers can effectively resolve the clock timing violation. Since PST buffers, which can reset the clock delay to flip-flops after the chip is manufactured, impose a non-trivial implementation area and control circuitry, it is very important to minimally allocate PST buffers while satisfying the chip yield constraint. In this dissertation, we (1) develop a graph-based chip yield computation technique which can update yields very efficiently and accurately for incremental PST buffer allocation, based on which we (2) propose a systematic (bottom-up and top-down with refinement) PST buffer allocation algorithm that is able to fully explore the design space of PST buffer allocation.
Second, clock skew scheduling is one of the essential steps that must be carefully performed during the design process. This dissertation addresses the clock skew optimization problem integrated with the consideration of the interdependent relation between the setup and hold skews, and clk-to-Q delay of flip-flops, so that the time margin is more accurately and reliably set aside over that of the previous methods, which have never taken the integrated problem into account. Precisely, based on an accurate flexible model of setup skew, hold skew, and clk-to-Q delay, we propose a stepwise clock skew scheduling technique in which at each iteration, the worst slack of setup and hold skews is systematically and incrementally relaxed to maximally extend the time margin.
Lastly, clock tree with cross links and clock spine have an intermediate characteristics for skew tolerance and power consumption, compared to clock tree and clock mesh which are two extreme structures of clock network. Unlike the clock tree with links between clock nodes, which is a sort of an incremental modification of the structure of clock tree, clock spine network is a completely separated structure from the structures of tree and mesh. Consequently, it is necessary and essential to develop a synthesis algorithm for clock spines, which will be compatible to the existing synthesis algorithms of clock trees and clock meshes. To this end, this dissertation first addresses the problem of automating the synthesis of clock-gated clock spines with the objective of minimizing total clock power while meeting the clock skew and slew constraints. The key idea of our proposed synthesis algorithm is to identify and group the flip-flops with tight correlation of clock-gating operations together to form a spine while accurately predicting and maintaining clock skew and slew variations through the buffer insertion and stub allocation.
In summary, this dissertation presents clock tuning techniques with consideration of post-silicon tuning, flexible flip-flop timing model, and clock-gated clock spine synthesis algorithm.Abstract i
Chapter 1 INTRODUCTION 1
1.1 Clock Distribution Network . . . . . . . . . . . . . . . . . . . . . 1
1.2 Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Flexible Flip-flop Timing Model . . . . . . . . . . . . . . . . . . . 3
1.4 Clock Spine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Contributions of This Dissertation . . . . . . . . . . . . . . . . . 6
Chapter 2 POST-SILICON TUNABLE CLOCK BUFFER ALLOCATION BASED ON FAST CHIP YIELD COMPUTATION
8
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Systematic Exploration of PST Buffer Allocation . . . . . . . . . 10
2.2.1 Observations . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Allocation Algorithm . . . . . . . . . . . . . . . . . . . . . 16
2.3 Fast Timing Yield Computation . . . . . . . . . . . . . . . . . . 17
2.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 Incremental Yield Computation . . . . . . . . . . . . . . . 22
2.4 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 PST Buffer Configuration Techniques . . . . . . . . . . . . . . . 31
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Chapter 3 POST-SILICON TUNING BASED ON FLEXIBLE FLIP-FLOP TIMING 34
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Preliminary and Definitions . . . . . . . . . . . . . . . . . . . . . 40
3.2.1 Flexible Flip-Flop Timing Model . . . . . . . . . . . . . . 40
3.2.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Motivational Examples . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Clock Skew Scheduling for Slack Relaxation Based on Flexible Flip-Flop Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.1 Overall Flow . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.2 Finding Local Clock Skew Schedule . . . . . . . . . . . . 48
3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Chapter 4 SYNTHESIS FOR POWER-AWARE CLOCK SPINES 61
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Preliminaries and Motivation . . . . . . . . . . . . . . . . . . . . 64
4.2.1 Clock Spine . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.2 Activity Patterns . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.3 Power Computation . . . . . . . . . . . . . . . . . . . . . 67
4.3 Algorithm for Clock Spine Synthesis . . . . . . . . . . . . . . . . 68
4.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . 68
4.3.2 Power-Aware Sink Clustering . . . . . . . . . . . . . . . . 70
4.3.3 Spine Relaxation . . . . . . . . . . . . . . . . . . . . . . . 77
4.3.4 Spine Buffer Allocation . . . . . . . . . . . . . . . . . . . 80
4.3.5 Top-Level Tree Construction . . . . . . . . . . . . . . . . 86
4.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Chapter 5 CONCLUSION 95
5.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Bibliography 97
์ด๋ก 106Docto
Variability-Aware VLSI Design Automation For Nanoscale Technologies
As technology scaling enters the nanometer regime, design of large scale ICs gets more challenging due to shrinking feature sizes and increasing design complexity. Aggressive scaling causes significant degradation in reliability, increased susceptibility to fabrication and environmental randomness and increased dynamic and leakage power dissipation. In this work, we investigate these scaling issues in large scale integrated systems.
This dissertation proposes to develop variability-aware design methodologies by proposing design analysis, design-time optimization, post-silicon tunability and runtime-adaptivity based optimization techniques for handling variability. We discuss our research in the area of variability-aware analysis, specifically
focusing on the problem of statistical timing analysis. The first technique presents the concept of error budgeting that achieves significant runtime speedups during statistical timing analysis. The second work presents a general framework for non-linear non-Gaussian statistical timing analysis considering correlations.
Further, we present our work on design-time optimization schemes that are applicable during physical synthesis. Firstly, we present a buffer insertion technique that considers wire-length uncertainty and proposes algorithms to perform probabilistic buffer insertion. Secondly, we present a stochastic optimization framework
based on Monte-Carlo technique considering fabrication variability. This optimization framework can be applied to problems that can be modeled as linear programs without without imposing any assumptions on the nature of the variability.
Subsequently, we present our work on post-silicon tunability based design optimization. This work presents a design management framework that can be used to balance the effort spent on pre-silicon (through gate sizing) and post-silicon optimization (through tunable clock-tree buffers) while maximizing the yield gains. Lastly, we present our work on variability-aware runtime optimization techniques. We look at the problem of runtime supply voltage scaling for dynamic power optimization, and propose a framework to consider the impact of variability on the reliability of such designs. We propose a probabilistic design synthesis technique
where reliability of the design is a primary optimization metric
Algorithmic techniques for nanometer VLSI design and manufacturing closure
As Very Large Scale Integration (VLSI) technology moves to the nanoscale
regime, design and manufacturing closure becomes very difficult to achieve due to
increasing chip and power density. Imperfections due to process, voltage and temperature variations aggravate the problem. Uncertainty in electrical characteristic of
individual device and wire may cause significant performance deviations or even functional failures. These impose tremendous challenges to the continuation of Moore's
law as well as the growth of semiconductor industry.
Efforts are needed in both deterministic design stage and variation-aware design
stage. This research proposes various innovative algorithms to address both stages for
obtaining a design with high frequency, low power and high robustness. For deterministic optimizations, new buffer insertion and gate sizing techniques are proposed. For
variation-aware optimizations, new lithography-driven and post-silicon tuning-driven
design techniques are proposed.
For buffer insertion, a new slew buffering formulation is presented and is proved
to be NP-hard. Despite this, a highly efficient algorithm which runs > 90x faster
than the best alternatives is proposed. The algorithm is also extended to handle
continuous buffer locations and blockages.
For gate sizing, a new algorithm is proposed to handle discrete gate library in
contrast to unrealistic continuous gate library assumed by most existing algorithms. Our approach is a continuous solution guided dynamic programming approach, which
integrates the high solution quality of dynamic programming with the short runtime
of rounding continuous solution.
For lithography-driven optimization, the problem of cell placement considering
manufacturability is studied. Three algorithms are proposed to handle cell flipping
and relocation. They are based on dynamic programming and graph theoretic approaches, and can provide different tradeoff between variation reduction and wire-
length increase.
For post-silicon tuning-driven optimization, the problem of unified adaptivity
optimization on logical and clock signal tuning is studied, which enables us to significantly save resources. The new algorithm is based on a novel linear programming
formulation which is solved by an advanced robust linear programming technique.
The continuous solution is then discretized using binary search accelerated dynamic
programming, batch based optimization, and Latin Hypercube sampling based fast
simulation
Recommended from our members
MANAGING AND LEVERAGING VARIATIONS AND NOISE IN NANOMETER CMOS
Advanced CMOS technologies have enabled high density designs at the cost of complex fabrication process. Variation in oxide thickness and Random Dopant Fluctuation (RDF) lead to variation in transistor threshold voltage Vth. Current photo-lithography process used for printing decreasing critical dimensions result in variation in transistor channel length and width. A related challenge in nanometer CMOS is that of on-chip random noise. With decreasing threshold voltage and operating voltage; and increasing operating temperature, CMOS devices are more sensitive to random on-chip noise in advanced technologies.
In this thesis, we explore novel circuit techniques to manage the impact of process variation in nanometer CMOS technologies. We also analyze the impact of on-chip noise on CMOS circuits and propose techniques to leverage or manage impact of noise based on the application. True Random Number Generator (TRNG) is an interesting cryptographic primitive that leverages on-chip noise to generate random bits; however, it is highly sensitive to process variation. We explore novel metastability circuits to alleviate the impact of variations and at the same time leverage on-chip noise sources like Random Thermal Noise and Random Telegraph Noise (RTN) to generate high quality random bits. We develop stochastic models for metastability based TRNG circuits to analyze the impact of variation and noise. The stochastic models are used to analyze and compare low power, energy efficient and lightweight post-processing techniques targeted to low power applications like System on Chip (SoC) and RFID. We also propose variation aware circuit calibration techniques to increase reliability. We extended this technique to a more generic application of designing Post-Si Tunable (PST) clock buffers to increase parametric yield in the presence of process variation. Apart from one time variation due to fabrication process, transistors undergo constant change in threshold voltage due to aging/wear-out effects and RTN. Process variation affects conventional sensors and introduces inaccuracies during measurement. We present a lightweight wear-out sensor that is tolerant to process variation and provides a fine grained wear-out sensing. A similar circuit is designed to sense fluctuation in transistor threshold voltage due to RTN. Although thermal noise and RTN are leveraged in applications like TRNG, they affect the stability of sensitive circuits like Static Random Access Memory (SRAM). We analyze the impact of on-chip noise on Bit Error Rate (BER) and post-Si test coverage of SRAM cells
Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip
The sustained demand for faster, more powerful chips has been met by the
availability of chip manufacturing processes allowing for the integration of increasing
numbers of computation units onto a single die. The resulting outcome,
especially in the embedded domain, has often been called SYSTEM-ON-CHIP
(SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC).
MPSoC design brings to the foreground a large number of challenges, one of
the most prominent of which is the design of the chip interconnection. With a
number of on-chip blocks presently ranging in the tens, and quickly approaching
the hundreds, the novel issue of how to best provide on-chip communication
resources is clearly felt.
NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable
answer to this design concern. By bringing large-scale networking concepts to
the on-chip domain, they guarantee a structured answer to present and future
communication requirements. The point-to-point connection and packet switching
paradigms they involve are also of great help in minimizing wiring overhead
and physical routing issues. However, as with any technology of recent inception,
NoC design is still an evolving discipline. Several main areas of interest
require deep investigation for NoCs to become viable solutions:
โข The design of the NoC architecture needs to strike the best tradeoff among
performance, features and the tight area and power constraints of the onchip
domain.
โข Simulation and verification infrastructure must be put in place to explore,
validate and optimize the NoC performance.
โข NoCs offer a huge design space, thanks to their extreme customizability in
terms of topology and architectural parameters. Design tools are needed
to prune this space and pick the best solutions.
โข Even more so given their global, distributed nature, it is essential to evaluate
the physical implementation of NoCs to evaluate their suitability for
next-generation designs and their area and power costs.
This dissertation performs a design space exploration of network-on-chip architectures,
in order to point-out the trade-offs associated with the design of
each individual network building blocks and with the design of network topology
overall. The design space exploration is preceded by a comparative analysis
of state-of-the-art interconnect fabrics with themselves and with early networkon-
chip prototypes. The ultimate objective is to point out the key advantages
that NoC realizations provide with respect to state-of-the-art communication
infrastructures and to point out the challenges that lie ahead in order to make
this new interconnect technology come true. Among these latter, technologyrelated
challenges are emerging that call for dedicated design techniques at all
levels of the design hierarchy. In particular, leakage power dissipation, containment
of process variations and of their effects. The achievement of the above
objectives was enabled by means of a NoC simulation environment for cycleaccurate
modelling and simulation and by means of a back-end facility for the
study of NoC physical implementation effects. Overall, all the results provided
by this work have been validated on actual silicon layout
Resource and thermal management in 3D-stacked multi-/many-core systems
Continuous semiconductor technology scaling and the rapid increase in computational needs have stimulated the emergence of multi-/many-core processors. While up to hundreds of cores can be placed on a single chip, the performance capacity of the cores cannot be fully exploited due to high latencies of interconnects and memory, high power consumption, and low manufacturing yield in traditional (2D) chips. 3D stacking is an emerging technology that aims to overcome these limitations of 2D designs by stacking processor dies over each other and using through-silicon-vias (TSVs) for on-chip communication, and thus, provides a large amount of on-chip resources and shortens communication latency. These benefits, however, are limited by challenges in high power densities and temperatures.
3D stacking also enables integrating heterogeneous technologies into a single chip. One example of heterogeneous integration is building many-core systems with silicon-photonic network-on-chip (PNoC), which reduces on-chip communication latency significantly and provides higher bandwidth compared to electrical links. However, silicon-photonic links are vulnerable to on-chip thermal and process variations. These variations can be countered by actively tuning the temperatures of optical devices through micro-heaters, but at the cost of substantial power overhead.
This thesis claims that unearthing the energy efficiency potential of 3D-stacked systems requires intelligent and application-aware resource management. Specifically, the thesis improves energy efficiency of 3D-stacked systems via three major components of computing systems: cache, memory, and on-chip communication. We analyze characteristics of workloads in computation, memory usage, and communication, and present techniques that leverage these characteristics for energy-efficient computing.
This thesis introduces 3D cache resource pooling, a cache design that allows for flexible heterogeneity in cache configuration across a 3D-stacked system and improves cache utilization and system energy efficiency. We also demonstrate the impact of resource pooling on a real prototype 3D system with scratchpad memory.
At the main memory level, we claim that utilizing heterogeneous memory modules and memory object level management significantly helps with energy efficiency. This thesis proposes a memory management scheme at a finer granularity: memory object level, and a page allocation policy to leverage the heterogeneity of available memory modules and cater to the diverse memory requirements of workloads.
On the on-chip communication side, we introduce an approach to limit the power overhead of PNoC in (3D) many-core systems through cross-layer thermal management. Our proposed thermally-aware workload allocation policies coupled with an adaptive thermal tuning policy minimize the required thermal tuning power for PNoC, and in this way, help broader integration of PNoC. The thesis also introduces techniques in placement and floorplanning of optical devices to reduce optical loss and, thus, laser source power consumption.2018-03-09T00:00:00
- โฆ