199 research outputs found

    ๋น„์šฉ ํšจ์œจ์ ์ธ ํด๋Ÿญ ๋ฐ ํŒŒ์›Œ ๊ฒŒ์ดํŒ… ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€,2020. 2. ๊น€ํƒœํ™˜.์ €์ „๋ ฅ ์„ค๊ณ„๋Š” ์ตœ์‹  ์‹œ์Šคํ…œ-์˜จ-์นฉ (SoCs) ์„ค๊ณ„์—์„œ ๋งค์šฐ ์ค‘์š”ํ•œ ์š”์†Œ ์ค‘์˜ ํ•˜๋‚˜์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋™์  ๋ฐ ์ •์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์ €์ „๋ ฅ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก ์— ๋Œ€ํ•ด ๋…ผํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ๋น„์šฉ ํšจ์œจ์ ์ธ ์ €์ „๋ ฅ ์„ค๊ณ„๋ฅผ ์œ„ํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ์ƒˆ๋กœ์šด ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•œ๋‹ค. ์šฐ์„  ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋™์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ…์€ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ• ์ค‘์˜ ํ•˜๋‚˜์ด๋‹ค. ํ•˜์ง€๋งŒ ์ด ๋ฐฉ๋ฒ•์€ ๋” ๋งŽ์€ ํ”Œ๋ฆฝ-ํ”Œ๋ž์— ๋Œ€ํ•ด ์ ์šฉํ• ์ˆ˜๋ก ํด๋Ÿญ ๊ฒŒ์ดํŒ…์— ํ•„์š”ํ•œ ๋ถ€๊ฐ€ ํšŒ๋กœ๊ฐ€ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ทผ๋ณธ์ ์ธ ํ•œ๊ณ„๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ ๊ธฐ์กด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์— ํ•„์š”ํ•œ ํšŒ๋กœ ์ž์›์„ ๋ถ„์„ํ•˜์—ฌ ํ•ด๋‹น ๋ฐฉ๋ฒ•์˜ ๋น„ํšจ์œจ์„ฑ์„ ๋ณด์ด๊ณ , ๊ธฐ์กด ๋ฐฉ๋ฒ•์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ฒ€์ถœ์— ํ•„์ˆ˜์ ์ด์ง€๋งŒ ๊ณ ๋น„์šฉ์˜ XOR ๊ฒŒ์ดํŠธ๋ฅผ ์™„๋ฒฝํžˆ ์ œ๊ฑฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ƒํƒœ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ…'์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ ์ œ์•ˆ๋œ XOR ๊ฒŒ์ดํŠธ๊ฐ€ ํ•„์š” ์—†๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์œ„ํ•œ ๋ถ€๊ฐ€ ํšŒ๋กœ๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ๋‹ค์–‘ํ•œ ํƒ€์ด๋ฐ ๋ถ„์„์„ ํ†ตํ•˜์—ฌ ํ•ด๋‹น ํšŒ๋กœ๊ฐ€ ์•ˆ์ •์ ์œผ๋กœ ์ ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ์„ธ ๋ฒˆ์งธ๋กœ ํšŒ๋กœ์˜ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ƒํƒœ ํ”„๋กœํŒŒ์ผ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ์ œ์•ˆ๋œ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ•์„ ๊ธฐ์กด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ•๊ณผ ์™„๋ฒฝํ•˜๊ฒŒ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์—ฌ๋Ÿฌ ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๊ธฐ์กด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์ด ์ „๋ ฅ ์†Œ๋น„ ์ ˆ๊ฐ ๊ธฐํšŒ๋ฅผ ๋†“์น˜๋Š” ๋ฐ˜๋ฉด ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๋ชจ๋“  ํƒ€์ด๋ฐ ์ œ์•ฝ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋ฉด์„œ ์ „๋ ฅ ์†Œ๋น„ ๊ฐ์†Œ์— ๋งค์šฐ ํšจ๊ณผ์ ์ž„์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‹ค์Œ์œผ๋กœ ์ •์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•œ ๋ฐฉ์•ˆ์œผ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด ํŒŒ์›Œ ๊ฒŒ์ดํŠธ ํšŒ๋กœ์˜ ์ƒํƒœ ๋ณด์กด์šฉ ์ €์žฅ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฐฉ๋ฒ•๋“ค์ด ์ง€๋‹ˆ๊ณ  ์žˆ๋Š” ๋‘ ๊ฐ€์ง€ ์ค‘์š”ํ•œ ํ•œ๊ณ„๋“ค์„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ค‘์š”ํ•œ ํ•œ๊ณ„๋“ค์ด๋ž€ ์ฒซ ๋ฒˆ์งธ๋กœ ๋‹ค์ค‘-๋น„ํŠธ ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์˜ ๋ฌด๋ถ„๋ณ„ํ•œ ์‚ฌ์šฉ์œผ๋กœ ์ธํ•œ ๊ธด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์ด๋ฉฐ, ๋‘ ๋ฒˆ์งธ๋กœ ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ๋˜๋จน์ž„ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์˜ ์ตœ์ ํ™” ๋ถˆ๊ฐ€๋Šฅ์„ฑ์ด๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์—์„œ๋Š” ์ƒํƒœ ๋ณด์กด์„ ์œ„ํ•œ ์ €์žฅ ๊ณต๊ฐ„์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ธด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์ด ํ•„์ˆ˜์ ์ด์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋˜๋จน์ž„ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ํ”Œ๋ฆฝ-ํ”Œ๋ž์€ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์—†๋Š” ๋Œ€์ƒ์œผ๋กœ ๋‹ค๋ฃจ์–ด์กŒ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ผ๋ฐ˜์ ์œผ๋กœ ํ•˜๋“œ์›จ์–ด ๊ธฐ์ˆ  ์–ธ์–ด(HDL)๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋˜๋Š” ๋˜๋จน์ž„ ๋ฃจํ”„๋ฅผ ์ง€๋‹Œ ํ”Œ๋ฆฝ-ํ”Œ๋ž์€ ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ์„ ์ •๋„๋กœ ์ ์€ ์–‘์ด ์•„๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ตœ๋Œ€ 2 ๋น„ํŠธ์˜ ๋‹ค์ค‘-๋น„ํŠธ ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์„ ์‚ฌ์šฉํ•˜์—ฌ ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋‘ ํด๋Ÿญ ์‚ฌ์ดํด๋กœ ์ œํ•œํ•˜๋ฉด์„œ๋„ ์ƒํƒœ ๋ณด์กด์„ ์œ„ํ•œ ์ €์žฅ ๊ณต๊ฐ„์„ ํšจ์œจ์ ์œผ๋กœ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‘ ๋ฒˆ์งธ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋˜๋จน์ž„ ๋ฃจํ”„๋ฅผ ์ง€๋‹Œ ํ”Œ๋ฆฝ-ํ”Œ๋ž์ด ํฌํ•จ๋œ ๋‘ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์Œ์˜ ์ƒํƒœ๋ฅผ ๋ณต์›ํ•  ์ˆ˜ ์žˆ๋Š” 2๋‹จ ์ƒํƒœ ๋ณด์กด ์ œ์–ด ๋ฐฉ์•ˆ์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ ์ฃผ์–ด์ง„ ํšŒ๋กœ์—์„œ ์ถฉ๋Œ์—†์ด ๋™์‹œ์— ์กด์žฌํ•  ์ˆ˜ ์žˆ๋Š” ํ”Œ๋ฆฝ-ํ”Œ๋ž ์Œ์„ ์ตœ๋Œ€๋กœ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด ๋…๋ฆฝ ์ง‘ํ•ฉ ๋ฌธ์ œ(independent set problem)๊ธฐ๋ฐ˜์˜ ์—ฐ์‚ฐ๋ฒ•๋„ ์ œ์•ˆํ•œ๋‹ค. ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋‘ ํด๋Ÿญ ์‚ฌ์ดํด๋กœ ์ œํ•œํ•˜๋ฉด์„œ๋„ ์ƒํƒœ ๋ณด์กด์— ํ•„์š”ํ•œ ์ €์žฅ ๊ณต๊ฐ„๊ณผ ํŒŒ์›Œ๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๋Š”๋ฐ ๋งค์šฐ ํšจ๊ณผ์ ์ž„์„ ๋ณด์—ฌ์ค€๋‹ค.Low power design is of great importance in modern system-on-chips (SoCs). This dissertation studies on low power design methodologies for saving dynamic and static power consumption. Precisely, we unveil two novel techniques of cost effective low power design. Firstly, we propose a novel clock gating method for reducing the dynamic power consumption. Flip-flop's input data toggling based clock gating is one of the most commonly used clock gating methods, in which one critical and inherent limitation is the sharp increase of gating logic as more flip-flops are involved in gating. In this dissertation, we propose a new clock gating method to overcome this limitation. Specifically, (1) we analyze the resources of gating logic in the input data toggling based clock gating, from which an ineffectiveness in resource utilization is observed and we propose a new clock gating technique called flip-flop state driven clock gating which completely eliminates the essential and expensive component of XOR gates for detecting input toggling of flip-flops; (2) we provide the supporting logic circuitry of our proposed XOR-free clock gating, confirming its safe applicability through a comprehensive timing analysis; (3) we propose, based on the flip-flops' state profile, a clock gating methodology that seamlessly combines our flip-flop state based clock gating with the toggling based clock gating. Through experiments with benchmark circuits, it is confirmed that our clock gating method is very effective in reducing power, which otherwise the toggling based clock gating shall miss the power saving opportunity, while meeting all timing constraints. Secondly, for reducing the static power consumption, we solve two critical limitations of the conventional approaches to the allocation of state retention storage for power gated circuits. Those are (1) the long wakeup delay caused by the senseless use of multi-bit retention flip-flops (MBRFFs) and (2) the inability to optimize retention flip-flops for the flip-flops with mux-feedback loop. It should be noted that the conventional approaches have regarded the long wakeup delay as an inevitable consequence of maximizing the reduction of total storage size for state retention while they have treated the flip-flops with mux-feedback loop (called self-loop flip-flop) as nonoptimizable component, but practically, the self-loop flip-flops synthesized from hardware description language (HDL) code are not far from a small amount and thus, can in no way be negligible. More precisely, for solving (1), we show that the use of MBRFFs with up to two bits, consequently, constraining the wakeup delay to no more than two clock cycles, is enough to maintain the high reduction of total retention storage and for solving (2), we devise a 2-phase retention control mechanism for a pair of flip-flops, one of which has self-loop, by which just a single retention bit can be used to restore state of the two flip-flops, and propose an independent set based algorithm for maximally extracting the non-conflict pairs from circuits. Through experiments with benchmark circuits, it is shown that our proposed method is very effective against reducing the state retention storage and the power consumption compared with the existing best MBRFF allocation while the wakeup delay is strictly limited to two clock cycles.1 INTRODUCTION 1 1.1 Clock Gating 1 1.2 Power Gating and State Retention 3 1.3 Multi-bit Retention Registers 4 1.4 Contributions of This Dissertation 6 2 FLIP-FLOP STATE DRIVEN CLOCK GATING: CONCEPT, DESIGN, AND METHODOLOGY 9 2.1 Motivations 9 2.1.1 Toggling based Clock Gating 9 2.1.2 Area and Power by Clock Gating 10 2.2 The Proposed Clock Gating 13 2.2.1 Concept of Flip-flop State Driven Clock Gating 13 2.2.2 Design of Gating Logic Circuitry 17 2.2.3 Integrated Clock Gating Methodology 22 2.2.4 Cost Formulation 23 2.3 Experiments 25 2.3.1 Experimental Setup 25 2.3.2 Experimental Results 26 3 ALGORITHM AND DESIGN OPTIMIZATION OF ALLOCATING MULTI-BIT RETENTION FLIP-FLOPS FOR POWER GATED CIRCUITS 32 3.1 Motivations 32 3.1.1 Flip-flops with Mux-feedback Loop 32 3.1.2 Impact of Wakeup Delay 37 3.2 The Proposed Allocation Algorithm 39 3.3 Design of Multi-Bit Retention Flip-Flop and Multi-Bit Extension 48 3.3.1 Multi-Bit Retention Flip-Flop 48 3.3.2 Multi-Bit Flip-Flop Extension 52 3.4 Experiments 54 3.4.1 Experimental Setup 54 3.4.2 Experimental Results 57 4 CONCLUSIONS 65 4.1 Flip-flop State Driven Clock Gating: Concept, Design, and Methodology 65 4.2 Algorithm and Design Optimization of Allocating Multi-bit Retention Flip-flops for Power Gated Circuits 66 Abstract (In Korean) 71Docto

    INVESTIGATING POWER MANAGEMENT SCHEMES IN OUT-OF-ORDER MICROPROCESSORS

    Get PDF

    Reliability in the face of variability in nanometer embedded memories

    Get PDF
    In this thesis, we have investigated the impact of parametric variations on the behaviour of one performance-critical processor structure - embedded memories. As variations manifest as a spread in power and performance, as a first step, we propose a novel modeling methodology that helps evaluate the impact of circuit-level optimizations on architecture-level design choices. Choices made at the design-stage ensure conflicting requirements from higher-levels are decoupled. We then complement such design-time optimizations with a runtime mechanism that takes advantage of adaptive body-biasing to lower power whilst improving performance in the presence of variability. Our proposal uses a novel fully-digital variation tracking hardware using embedded DRAM (eDRAM) cells to monitor run-time changes in cache latency and leakage. A special fine-grain body-bias generator uses the measurements to generate an optimal body-bias that is needed to meet the required yield targets. A novel variation-tolerant and soft-error hardened eDRAM cell is also proposed as an alternate candidate for replacing existing SRAM-based designs in latency critical memory structures. In the ultra low-power domain where reliable operation is limited by the minimum voltage of operation (Vddmin), we analyse the impact of failures on cache functional margin and functional yield. Towards this end, we have developed a fully automated tool (INFORMER) capable of estimating memory-wide metrics such as power, performance and yield accurately and rapidly. Using the developed tool, we then evaluate the #effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Having a holistic perspective of memory-wide metrics helps us arrive at design-choices optimized simultaneously for multiple metrics needed for maintaining lifetime requirements

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoCโ€”its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Low Power Memory/Memristor Devices and Systems

    Get PDF
    This reprint focusses on achieving low-power computation using memristive devices. The topic was designed as a convenient reference point: it contains a mix of techniques starting from the fundamental manufacturing of memristive devices all the way to applications such as physically unclonable functions, and also covers perspectives on, e.g., in-memory computing, which is inextricably linked with emerging memory devices such as memristors. Finally, the reprint contains a few articles representing how other communities (from typical CMOS design to photonics) are fighting on their own fronts in the quest towards low-power computation, as a comparison with the memristor literature. We hope that readers will enjoy discovering the articles within

    Design Techniques for Energy-Quality Scalable Digital Systems

    Get PDF
    Energy efficiency is one of the key design goals in modern computing. Increasingly complex tasks are being executed in mobile devices and Internet of Things end-nodes, which are expected to operate for long time intervals, in the orders of months or years, with the limited energy budgets provided by small form-factor batteries. Fortunately, many of such tasks are error resilient, meaning that they can toler- ate some relaxation in the accuracy, precision or reliability of internal operations, without a significant impact on the overall output quality. The error resilience of an application may derive from a number of factors. The processing of analog sensor inputs measuring quantities from the physical world may not always require maximum precision, as the amount of information that can be extracted is limited by the presence of external noise. Outputs destined for human consumption may also contain small or occasional errors, thanks to the limited capabilities of our vision and hearing systems. Finally, some computational patterns commonly found in domains such as statistics, machine learning and operational research, naturally tend to reduce or eliminate errors. Energy-Quality (EQ) scalable digital systems systematically trade off the quality of computations with energy efficiency, by relaxing the precision, the accuracy, or the reliability of internal software and hardware components in exchange for energy reductions. This design paradigm is believed to offer one of the most promising solutions to the impelling need for low-energy computing. Despite these high expectations, the current state-of-the-art in EQ scalable design suffers from important shortcomings. First, the great majority of techniques proposed in literature focus only on processing hardware and software components. Nonetheless, for many real devices, processing contributes only to a small portion of the total energy consumption, which is dominated by other components (e.g. I/O, memory or data transfers). Second, in order to fulfill its promises and become diffused in commercial devices, EQ scalable design needs to achieve industrial level maturity. This involves moving from purely academic research based on high-level models and theoretical assumptions to engineered flows compatible with existing industry standards. Third, the time-varying nature of error tolerance, both among different applications and within a single task, should become more central in the proposed design methods. This involves designing โ€œdynamicโ€ systems in which the precision or reliability of operations (and consequently their energy consumption) can be dynamically tuned at runtime, rather than โ€œstaticโ€ solutions, in which the output quality is fixed at design-time. This thesis introduces several new EQ scalable design techniques for digital systems that take the previous observations into account. Besides processing, the proposed methods apply the principles of EQ scalable design also to interconnects and peripherals, which are often relevant contributors to the total energy in sensor nodes and mobile systems respectively. Regardless of the target component, the presented techniques pay special attention to the accurate evaluation of benefits and overheads deriving from EQ scalability, using industrial-level models, and on the integration with existing standard tools and protocols. Moreover, all the works presented in this thesis allow the dynamic reconfiguration of output quality and energy consumption. More specifically, the contribution of this thesis is divided in three parts. In a first body of work, the design of EQ scalable modules for processing hardware data paths is considered. Three design flows are presented, targeting different technologies and exploiting different ways to achieve EQ scalability, i.e. timing-induced errors and precision reduction. These works are inspired by previous approaches from the literature, namely Reduced-Precision Redundancy and Dynamic Accuracy Scaling, which are re-thought to make them compatible with standard Electronic Design Automation (EDA) tools and flows, providing solutions to overcome their main limitations. The second part of the thesis investigates the application of EQ scalable design to serial interconnects, which are the de facto standard for data exchanges between processing hardware and sensors. In this context, two novel bus encodings are proposed, called Approximate Differential Encoding and Serial-T0, that exploit the statistical characteristics of data produced by sensors to reduce the energy consumption on the bus at the cost of controlled data approximations. The two techniques achieve different results for data of different origins, but share the common features of allowing runtime reconfiguration of the allowed error and being compatible with standard serial bus protocols. Finally, the last part of the manuscript is devoted to the application of EQ scalable design principles to displays, which are often among the most energy- hungry components in mobile systems. The two proposals in this context leverage the emissive nature of Organic Light-Emitting Diode (OLED) displays to save energy by altering the displayed image, thus inducing an output quality reduction that depends on the amount of such alteration. The first technique implements an image-adaptive form of brightness scaling, whose outputs are optimized in terms of balance between power consumption and similarity with the input. The second approach achieves concurrent power reduction and image enhancement, by means of an adaptive polynomial transformation. Both solutions focus on minimizing the overheads associated with a real-time implementation of the transformations in software or hardware, so that these do not offset the savings in the display. For each of these three topics, results show that the aforementioned goal of building EQ scalable systems compatible with existing best practices and mature for being integrated in commercial devices can be effectively achieved. Moreover, they also show that very simple and similar principles can be applied to design EQ scalable versions of different system components (processing, peripherals and I/O), and to equip these components with knobs for the runtime reconfiguration of the energy versus quality tradeoff

    Runtime Monitoring for Dependable Hardware Design

    Get PDF
    Mit dem Voranschreiten der Technologieskalierung und der Globalisierung der Produktion von integrierten Schaltkreisen erรถffnen sich eine Fรผlle von Schwachstellen bezรผglich der Verlรคsslichkeit von Computerhardware. Jeder Mikrochip wird aufgrund von Produktionsschwankungen mit einem einzigartigen Charakter geboren, welcher sich durch seine Arbeitsbedingungen, Belastung und Umgebung in individueller Weise entwickelt. Daher sind deterministische Modelle, welche zur Entwurfszeit die Verlรคsslichkeit prognostizieren, nicht mehr ausreichend um Integrierte Schaltkreise mit Nanometertechnologie sinnvoll abbilden zu kรถnnen. Der Bedarf einer Laufzeitanalyse des Zustandes steigt und mit ihm die notwendigen MaรŸnahmen zum Erhalt der Zuverlรคssigkeit. Transistoren sind anfรคllig fรผr auslastungsbedingte Alterung, die die Laufzeit der Schaltung erhรถht und mit ihr die Mรถglichkeit einer Fehlberechnung. Hinzu kommen spezielle Ablรคufe die das schnelle Altern des Chips befรถrdern und somit seine zuverlรคssige Lebenszeit reduzieren. Zusรคtzlich kรถnnen strahlungsbedingte Laufzeitfehler (Soft-Errors) des Chips abnormales Verhalten kritischer Systeme verursachen. Sowohl das Ausbreiten als auch das Maskieren dieser Fehler wiederum sind abhรคngig von der Arbeitslast des Systems. Fabrizierten Chips kรถnnen ebenfalls vorsรคtzlich wรคhrend der Produktion boshafte Schaltungen, sogenannte Hardwaretrojaner, hinzugefรผgt werden. Dies kompromittiert die Sicherheit des Chips. Da diese Art der Manipulation vor ihrer Aktivierung kaum zu erfassen ist, ist der Nachweis von Trojanern auf einem Chip direkt nach der Produktion extrem schwierig. Die Komplexitรคt dieser Verlรคsslichkeitsprobleme machen ein einfaches Modellieren der Zuverlรคssigkeit und GegenmaรŸnahmen ineffizient. Sie entsteht aufgrund verschiedener Quellen, eingeschlossen der Entwicklungsparameter (Technologie, Gerรคt, Schaltung und Architektur), der Herstellungsparameter, der Laufzeitauslastung und der Arbeitsumgebung. Dies motiviert das Erforschen von maschinellem Lernen und Laufzeitmethoden, welche potentiell mit dieser Komplexitรคt arbeiten kรถnnen. In dieser Arbeit stellen wir Lรถsungen vor, die in der Lage sind, eine verlรคssliche Ausfรผhrung von Computerhardware mit unterschiedlichem Laufzeitverhalten und Arbeitsbedingungen zu gewรคhrleisten. Wir entwickelten Techniken des maschinellen Lernens um verschiedene Zuverlรคssigkeitseffekte zu modellieren, zu รผberwachen und auszugleichen. Verschiedene Lernmethoden werden genutzt, um gรผnstige รœberwachungspunkte zur Kontrolle der Arbeitsbelastung zu finden. Diese werden zusammen mit Zuverlรคssigkeitsmetriken, aufbauend auf Ausfallsicherheit und generellen Sicherheitsattributen, zum Erstellen von Vorhersagemodellen genutzt. Des Weiteren prรคsentieren wir eine kosten-optimierte Hardwaremonitorschaltung, welche die รœberwachungspunkte zur Laufzeit auswertet. Im Gegensatz zum aktuellen Stand der Technik, welcher mikroarchitektonische รœberwachungspunkte ausnutzt, evaluieren wir das Potential von Arbeitsbelastungscharakteristiken auf der Logikebene der zugrundeliegenden Hardware. Wir identifizieren verbesserte Features auf Logikebene um feingranulare Laufzeitรผberwachung zu ermรถglichen. Diese Logikanalyse wiederum hat verschiedene Stellschrauben um auf hรถhere Genauigkeit und niedrigeren Overhead zu optimieren. Wir untersuchten die Philosophie, รœberwachungspunkte auf Logikebene mit Hilfe von Lernmethoden zu identifizieren und gรผnstigen Monitore zu implementieren um eine adaptive Vorbeugung gegen statisches Altern, dynamisches Altern und strahlungsinduzierte Soft-Errors zu schaffen und zusรคtzlich die Aktivierung von Hardwaretrojanern zu erkennen. Diesbezรผglich haben wir ein Vorhersagemodell entworfen, welches den Arbeitslasteinfluss auf alterungsbedingte Verschlechterungen des Chips mitverfolgt und dazu genutzt werden kann, dynamisch zur Laufzeit vorbeugende Techniken, wie Task-Mitigation, Spannungs- und Frequenzskalierung zu benutzen. Dieses Vorhersagemodell wurde in Software implementiert, welche verschiedene Arbeitslasten aufgrund ihrer Alterungswirkung einordnet. Um die Widerstandsfรคhigkeit gegenรผber beschleunigter Alterung sicherzustellen, stellen wir eine รœberwachungshardware vor, welche einen Teil der kritischen Flip-Flops beaufsichtigt, nach beschleunigter Alterung Ausschau hรคlt und davor warnt, wenn ein zeitkritischer Pfad unter starker Alterungsbelastung steht. Wir geben die Implementierung einer Technik zum Reduzieren der durch das Ausfรผhren spezifischer Subroutinen auftretenden Belastung von zeitkritischen Pfaden. Zusรคtzlich schlagen wir eine Technik zur Abschรคtzung von online Soft-Error-Schwachstellen von Speicherarrays und Logikkernen vor, welche auf der รœberwachung einer kleinen Gruppe Flip-Flops des Entwurfs basiert. Des Weiteren haben wir eine Methode basierend auf Anomalieerkennung entwickelt, um Arbeitslastsignaturen von Hardwaretrojanern wรคhrend deren Aktivierung zur Laufzeit zu erkennen und somit eine letzte Verteidigungslinie zu bilden. Basierend auf diesen Experimenten demonstriert diese Arbeit das Potential von fortgeschrittener Feature-Extraktion auf Logikebene und lernbasierter Vorhersage basierend auf Laufzeitdaten zur Verbesserung der Zuverlรคssigkeit von Harwareentwรผrfen

    An FPGA implementation of an investigative many-core processor, Fynbos : in support of a Fortran autoparallelising software pipeline

    Get PDF
    Includes bibliographical references.In light of the power, memory, ILP, and utilisation walls facing the computing industry, this work examines the hypothetical many-core approach to finding greater compute performance and efficiency. In order to achieve greater efficiency in an environment in which Mooreโ€™s law continues but TDP has been capped, a means of deriving performance from dark and dim silicon is needed. The many-core hypothesis is one approach to exploiting these available transistors efficiently. As understood in this work, it involves trading in hardware control complexity for hundreds to thousands of parallel simple processing elements, and operating at a clock speed sufficiently low as to allow the efficiency gains of near threshold voltage operation. Performance is there- fore dependant on exploiting a new degree of fine-grained parallelism such as is currently only found in GPGPUs, but in a manner that is not as restrictive in application domain range. While removing the complex control hardware of traditional CPUs provides space for more arithmetic hardware, a basic level of control is still required. For a number of reasons this work chooses to replace this control largely with static scheduling. This pushes the burden of control primarily to the software and specifically the compiler, rather not to the programmer or to an application specific means of control simplification. An existing legacy tool chain capable of autoparallelising sequential Fortran code to the degree of parallelism necessary for many-core exists. This work implements a many-core architecture to match it. Prototyping the design on an FPGA, it is possible to examine the real world performance of the compiler-architecture system to a greater degree than simulation only would allow. Comparing theoretical peak performance and real performance in a case study application, the system is found to be more efficient than any other reviewed, but to also significantly under perform relative to current competing architectures. This failing is apportioned to taking the need for simple hardware too far, and an inability to implement static scheduling mitigating tactics due to lack of support for such in the compiler

    A fault-tolerant multiprocessor architecture for aircraft, volume 1

    Get PDF
    A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed

    Voyager spacecraft. Volume V - Alternate designs, subsystems considerations Study report, phase IA

    Get PDF
    Telecommunication, propulsion, control, electric, and mechanical subsystems design for Voyager spacecraf
    • โ€ฆ
    corecore