167 research outputs found

    Clock Tree Power Optimization of Three Dimensional VLSI System with Network

    Get PDF
    Abstract:The proposed method is based on minimum-cost maximum-flow formulation to globally determine the tree topology, which maintains load balance and considers the wirelength between pulse generators and pulsed latches. Experimental results indicate that the proposed migration approach can improve the power consumption by 12% and 13% with 7% and 70% skew improvements on average compared with the most recent paper on the industrial circuits and ISPD-2010 benchmarks, respectively. Minimizing the size of a clock tree is known as an effective approach to reduce power dissipation in modern circuit designs. However, most existing power-aware clock-tree minimization algorithms optimize power on the basis of flip-flops alone, which may result in limited power savings. To achieve a power and timing tradeoff, this paper investigates the pulsed-latch utilization in a clock tree for further power savings. This is the first paper to propose a migration approach to efficiently construct a clock tree with both pulsed-latches and flip-flops

    ๋น„์šฉ ํšจ์œจ์ ์ธ ํด๋Ÿญ ๋ฐ ํŒŒ์›Œ ๊ฒŒ์ดํŒ… ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€,2020. 2. ๊น€ํƒœํ™˜.์ €์ „๋ ฅ ์„ค๊ณ„๋Š” ์ตœ์‹  ์‹œ์Šคํ…œ-์˜จ-์นฉ (SoCs) ์„ค๊ณ„์—์„œ ๋งค์šฐ ์ค‘์š”ํ•œ ์š”์†Œ ์ค‘์˜ ํ•˜๋‚˜์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋™์  ๋ฐ ์ •์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์ €์ „๋ ฅ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก ์— ๋Œ€ํ•ด ๋…ผํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ๋น„์šฉ ํšจ์œจ์ ์ธ ์ €์ „๋ ฅ ์„ค๊ณ„๋ฅผ ์œ„ํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ์ƒˆ๋กœ์šด ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•œ๋‹ค. ์šฐ์„  ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋™์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ…์€ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ• ์ค‘์˜ ํ•˜๋‚˜์ด๋‹ค. ํ•˜์ง€๋งŒ ์ด ๋ฐฉ๋ฒ•์€ ๋” ๋งŽ์€ ํ”Œ๋ฆฝ-ํ”Œ๋ž์— ๋Œ€ํ•ด ์ ์šฉํ• ์ˆ˜๋ก ํด๋Ÿญ ๊ฒŒ์ดํŒ…์— ํ•„์š”ํ•œ ๋ถ€๊ฐ€ ํšŒ๋กœ๊ฐ€ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ทผ๋ณธ์ ์ธ ํ•œ๊ณ„๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ ๊ธฐ์กด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์— ํ•„์š”ํ•œ ํšŒ๋กœ ์ž์›์„ ๋ถ„์„ํ•˜์—ฌ ํ•ด๋‹น ๋ฐฉ๋ฒ•์˜ ๋น„ํšจ์œจ์„ฑ์„ ๋ณด์ด๊ณ , ๊ธฐ์กด ๋ฐฉ๋ฒ•์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ฒ€์ถœ์— ํ•„์ˆ˜์ ์ด์ง€๋งŒ ๊ณ ๋น„์šฉ์˜ XOR ๊ฒŒ์ดํŠธ๋ฅผ ์™„๋ฒฝํžˆ ์ œ๊ฑฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ƒํƒœ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ…'์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ ์ œ์•ˆ๋œ XOR ๊ฒŒ์ดํŠธ๊ฐ€ ํ•„์š” ์—†๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์œ„ํ•œ ๋ถ€๊ฐ€ ํšŒ๋กœ๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ๋‹ค์–‘ํ•œ ํƒ€์ด๋ฐ ๋ถ„์„์„ ํ†ตํ•˜์—ฌ ํ•ด๋‹น ํšŒ๋กœ๊ฐ€ ์•ˆ์ •์ ์œผ๋กœ ์ ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ์„ธ ๋ฒˆ์งธ๋กœ ํšŒ๋กœ์˜ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ƒํƒœ ํ”„๋กœํŒŒ์ผ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ์ œ์•ˆ๋œ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ•์„ ๊ธฐ์กด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ•๊ณผ ์™„๋ฒฝํ•˜๊ฒŒ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์—ฌ๋Ÿฌ ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๊ธฐ์กด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์ด ์ „๋ ฅ ์†Œ๋น„ ์ ˆ๊ฐ ๊ธฐํšŒ๋ฅผ ๋†“์น˜๋Š” ๋ฐ˜๋ฉด ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๋ชจ๋“  ํƒ€์ด๋ฐ ์ œ์•ฝ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋ฉด์„œ ์ „๋ ฅ ์†Œ๋น„ ๊ฐ์†Œ์— ๋งค์šฐ ํšจ๊ณผ์ ์ž„์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‹ค์Œ์œผ๋กœ ์ •์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•œ ๋ฐฉ์•ˆ์œผ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด ํŒŒ์›Œ ๊ฒŒ์ดํŠธ ํšŒ๋กœ์˜ ์ƒํƒœ ๋ณด์กด์šฉ ์ €์žฅ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฐฉ๋ฒ•๋“ค์ด ์ง€๋‹ˆ๊ณ  ์žˆ๋Š” ๋‘ ๊ฐ€์ง€ ์ค‘์š”ํ•œ ํ•œ๊ณ„๋“ค์„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ค‘์š”ํ•œ ํ•œ๊ณ„๋“ค์ด๋ž€ ์ฒซ ๋ฒˆ์งธ๋กœ ๋‹ค์ค‘-๋น„ํŠธ ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์˜ ๋ฌด๋ถ„๋ณ„ํ•œ ์‚ฌ์šฉ์œผ๋กœ ์ธํ•œ ๊ธด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์ด๋ฉฐ, ๋‘ ๋ฒˆ์งธ๋กœ ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ๋˜๋จน์ž„ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์˜ ์ตœ์ ํ™” ๋ถˆ๊ฐ€๋Šฅ์„ฑ์ด๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์—์„œ๋Š” ์ƒํƒœ ๋ณด์กด์„ ์œ„ํ•œ ์ €์žฅ ๊ณต๊ฐ„์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ธด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์ด ํ•„์ˆ˜์ ์ด์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋˜๋จน์ž„ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ํ”Œ๋ฆฝ-ํ”Œ๋ž์€ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์—†๋Š” ๋Œ€์ƒ์œผ๋กœ ๋‹ค๋ฃจ์–ด์กŒ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ผ๋ฐ˜์ ์œผ๋กœ ํ•˜๋“œ์›จ์–ด ๊ธฐ์ˆ  ์–ธ์–ด(HDL)๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋˜๋Š” ๋˜๋จน์ž„ ๋ฃจํ”„๋ฅผ ์ง€๋‹Œ ํ”Œ๋ฆฝ-ํ”Œ๋ž์€ ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ์„ ์ •๋„๋กœ ์ ์€ ์–‘์ด ์•„๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ตœ๋Œ€ 2 ๋น„ํŠธ์˜ ๋‹ค์ค‘-๋น„ํŠธ ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์„ ์‚ฌ์šฉํ•˜์—ฌ ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋‘ ํด๋Ÿญ ์‚ฌ์ดํด๋กœ ์ œํ•œํ•˜๋ฉด์„œ๋„ ์ƒํƒœ ๋ณด์กด์„ ์œ„ํ•œ ์ €์žฅ ๊ณต๊ฐ„์„ ํšจ์œจ์ ์œผ๋กœ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‘ ๋ฒˆ์งธ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋˜๋จน์ž„ ๋ฃจํ”„๋ฅผ ์ง€๋‹Œ ํ”Œ๋ฆฝ-ํ”Œ๋ž์ด ํฌํ•จ๋œ ๋‘ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์Œ์˜ ์ƒํƒœ๋ฅผ ๋ณต์›ํ•  ์ˆ˜ ์žˆ๋Š” 2๋‹จ ์ƒํƒœ ๋ณด์กด ์ œ์–ด ๋ฐฉ์•ˆ์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ ์ฃผ์–ด์ง„ ํšŒ๋กœ์—์„œ ์ถฉ๋Œ์—†์ด ๋™์‹œ์— ์กด์žฌํ•  ์ˆ˜ ์žˆ๋Š” ํ”Œ๋ฆฝ-ํ”Œ๋ž ์Œ์„ ์ตœ๋Œ€๋กœ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด ๋…๋ฆฝ ์ง‘ํ•ฉ ๋ฌธ์ œ(independent set problem)๊ธฐ๋ฐ˜์˜ ์—ฐ์‚ฐ๋ฒ•๋„ ์ œ์•ˆํ•œ๋‹ค. ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋‘ ํด๋Ÿญ ์‚ฌ์ดํด๋กœ ์ œํ•œํ•˜๋ฉด์„œ๋„ ์ƒํƒœ ๋ณด์กด์— ํ•„์š”ํ•œ ์ €์žฅ ๊ณต๊ฐ„๊ณผ ํŒŒ์›Œ๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๋Š”๋ฐ ๋งค์šฐ ํšจ๊ณผ์ ์ž„์„ ๋ณด์—ฌ์ค€๋‹ค.Low power design is of great importance in modern system-on-chips (SoCs). This dissertation studies on low power design methodologies for saving dynamic and static power consumption. Precisely, we unveil two novel techniques of cost effective low power design. Firstly, we propose a novel clock gating method for reducing the dynamic power consumption. Flip-flop's input data toggling based clock gating is one of the most commonly used clock gating methods, in which one critical and inherent limitation is the sharp increase of gating logic as more flip-flops are involved in gating. In this dissertation, we propose a new clock gating method to overcome this limitation. Specifically, (1) we analyze the resources of gating logic in the input data toggling based clock gating, from which an ineffectiveness in resource utilization is observed and we propose a new clock gating technique called flip-flop state driven clock gating which completely eliminates the essential and expensive component of XOR gates for detecting input toggling of flip-flops; (2) we provide the supporting logic circuitry of our proposed XOR-free clock gating, confirming its safe applicability through a comprehensive timing analysis; (3) we propose, based on the flip-flops' state profile, a clock gating methodology that seamlessly combines our flip-flop state based clock gating with the toggling based clock gating. Through experiments with benchmark circuits, it is confirmed that our clock gating method is very effective in reducing power, which otherwise the toggling based clock gating shall miss the power saving opportunity, while meeting all timing constraints. Secondly, for reducing the static power consumption, we solve two critical limitations of the conventional approaches to the allocation of state retention storage for power gated circuits. Those are (1) the long wakeup delay caused by the senseless use of multi-bit retention flip-flops (MBRFFs) and (2) the inability to optimize retention flip-flops for the flip-flops with mux-feedback loop. It should be noted that the conventional approaches have regarded the long wakeup delay as an inevitable consequence of maximizing the reduction of total storage size for state retention while they have treated the flip-flops with mux-feedback loop (called self-loop flip-flop) as nonoptimizable component, but practically, the self-loop flip-flops synthesized from hardware description language (HDL) code are not far from a small amount and thus, can in no way be negligible. More precisely, for solving (1), we show that the use of MBRFFs with up to two bits, consequently, constraining the wakeup delay to no more than two clock cycles, is enough to maintain the high reduction of total retention storage and for solving (2), we devise a 2-phase retention control mechanism for a pair of flip-flops, one of which has self-loop, by which just a single retention bit can be used to restore state of the two flip-flops, and propose an independent set based algorithm for maximally extracting the non-conflict pairs from circuits. Through experiments with benchmark circuits, it is shown that our proposed method is very effective against reducing the state retention storage and the power consumption compared with the existing best MBRFF allocation while the wakeup delay is strictly limited to two clock cycles.1 INTRODUCTION 1 1.1 Clock Gating 1 1.2 Power Gating and State Retention 3 1.3 Multi-bit Retention Registers 4 1.4 Contributions of This Dissertation 6 2 FLIP-FLOP STATE DRIVEN CLOCK GATING: CONCEPT, DESIGN, AND METHODOLOGY 9 2.1 Motivations 9 2.1.1 Toggling based Clock Gating 9 2.1.2 Area and Power by Clock Gating 10 2.2 The Proposed Clock Gating 13 2.2.1 Concept of Flip-flop State Driven Clock Gating 13 2.2.2 Design of Gating Logic Circuitry 17 2.2.3 Integrated Clock Gating Methodology 22 2.2.4 Cost Formulation 23 2.3 Experiments 25 2.3.1 Experimental Setup 25 2.3.2 Experimental Results 26 3 ALGORITHM AND DESIGN OPTIMIZATION OF ALLOCATING MULTI-BIT RETENTION FLIP-FLOPS FOR POWER GATED CIRCUITS 32 3.1 Motivations 32 3.1.1 Flip-flops with Mux-feedback Loop 32 3.1.2 Impact of Wakeup Delay 37 3.2 The Proposed Allocation Algorithm 39 3.3 Design of Multi-Bit Retention Flip-Flop and Multi-Bit Extension 48 3.3.1 Multi-Bit Retention Flip-Flop 48 3.3.2 Multi-Bit Flip-Flop Extension 52 3.4 Experiments 54 3.4.1 Experimental Setup 54 3.4.2 Experimental Results 57 4 CONCLUSIONS 65 4.1 Flip-flop State Driven Clock Gating: Concept, Design, and Methodology 65 4.2 Algorithm and Design Optimization of Allocating Multi-bit Retention Flip-flops for Power Gated Circuits 66 Abstract (In Korean) 71Docto

    Voltage stacking for near/sub-threshold operation

    Get PDF

    Architecture Independent Timing Speculation Techniques in VLSI Circuits.

    Full text link
    Conventional digital circuits must ensure correct operation throughout a wide range of operating conditions including process, voltage, and temperature variation. These conditions have an effect on circuit delays, and safety margins must be put in place which come at a power and performance cost. The Razor system proposed eliminating these timing margins by running a circuit with occasional timing errors and correcting the errors when they occur. Several existing Razor style designs have been proposed, however prior to this work, Razor could not be applied blindly or automatically to designs, as the various error correction schemes modified the architecture of the target design. Because of the architectural invasiveness and design complexities of these techniques, no published Razor style system had been applied to a complete existing commercial processor. Additionally, in all prior Razor-style systems, there is a fundamental tradeoff between speculation window and short path, or minimum delay, constraints, limiting the techniqueโ€™s effectiveness. This thesis introduces the concept of Razor using two-phase latch based timing. By identifying and utilizing time borrowing as an error correction mechanism, it allows for Razor to be applied without the need to reload data or replay instructions. This allows for Razor to be blindly and automatically applied to existing designs without detailed knowledge of internal architecture. Additionally, latch based Razor allows for large speculation windows, up to 100% of nominal circuit delay, because it breaks the connection between minimum delay constraints and speculation window. By demonstrating how to transform conventional flip-flop based designs, including those which make use of clock gating, to two-phase latch based timing, Razor can be automatically added to a large set of existing digital designs. Two forms of latch based Razor are proposed. First, Bubble Razor involves rippling stall cycles throughout a circuit in response to timing errors and is applied to the ARM Cortex-M3 processor, the first ever application of a Razor technique to a complete, existing processor design. Additional work applies Bubble Razor to the ARM Cortex-R4 processor. The second latch based Razor technique, Voltage Razor, uses voltage boosting to correct for timing errors.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/102461/1/mfojtik_1.pd

    Power Efficient Data-Aware SRAM Cell for SRAM-Based FPGA Architecture

    Get PDF
    The design of low-power SRAM cell becomes a necessity in today\u27s FPGAs, because SRAM is a critical component in FPGA design and consumes a large fraction of the total power. The present chapter provides an overview of various factors responsible for power consumption in FPGA and discusses the design techniques of low-power SRAM-based FPGA at system level, device level, and architecture levels. Finally, the chapter proposes a data-aware dynamic SRAM cell to control the power consumption in the cell. Stack effect has been adopted in the design to reduce the leakage current. The various peripheral circuits like address decoder circuit, write/read enable circuits, and sense amplifier have been modified to implement a power-efficient SRAM-based FPGA

    Low power VLSI design of a fir filter using dual edge triggered clocking strategy

    Get PDF
    Digital signal processing is an area of science and engineering that has developed rapidly over the past 30 years. This rapid development is a result of the significant advances in digital computer technology and integratedโ€“circuit fabrication. DSP processors are a diverse group, most share some common features designed to support fast execution of the repetitive, numerically intensive computations characteristic of digital signal processing algorithms. The most often cited of these features is the ability to perform a multiply-accumulate operation (often called a "MAC") in a single instruction cycle. Hence in this project a DSP Processor is designed which can perform the basic DSP Operations like convolution, fourier transform and filtering. The processor designed is a simple 4-bit processor which has single data line of 8-bits and a single address bus of 16-bits. With a set of branch instructions the project DSP will operate as a CISC processor with strong math capabilities and can perform the above mentioned DSP operations. The application I have taken is the low power FIR filter using dual edge clocking strategy. It combines two novel techniques for the power reduction which is : multi stage clock gating and a symmetric two-phase level-sensitive clocking with glitch aware re-distribution of data-path registers. Simulation results confirm a 42% reduction in power over single edge triggered clocking with clock gating.Also to further reduce the power consumption the a low power latch circuit is used. Thanks to a partial pass-transistor logic, it trades time for energy, being particularly suitable for low power low-frequency applications. Simulation results confirm the power reduction. This technique discussed can be implemented to portable devices which needs longer battery life and to ASICโ€™

    Design methodology and productivity improvement in high speed VLSI circuits

    Get PDF
    2017 Spring.Includes bibliographical references.To view the abstract, please see the full text of the document

    Robust Circuit Design for Low-Voltage VLSI.

    Full text link
    Voltage scaling is an effective way to reduce the overall power consumption, but the major challenges in low voltage operations include performance degradation and reliability issues due to PVT variations. This dissertation discusses three key circuit components that are critical in low-voltage VLSI. Level converters must be a reliable interface between two voltage domains, but the reduced on/off-current ratio makes it extremely difficult to achieve robust conversions at low voltages. Two static designs are proposed: LC2 adopts a novel pulsed-operation and modulates its pull-up strength depending on its state. A 3-sigma robustness is guaranteed using a current margin plot; SLC inherently reduces the contention by diode-insertion. Improvements in performance, power, and robustness are measured from 130nm CMOS test chips. SRAM is a major bottleneck in voltage-scaling due to its inherent ratioed-bitcell design. The proposed 7T SRAM alleviates the area overhead incurred by 8T bitcells and provides robust operation down to 0.32V in 180nm CMOS test chips with 3.35fW/bit leakage. Auto-Shut-Off provides a 6.8x READ energy reduction, and its innate Quasi-Static READ has been demonstrated which shows a much improved READ error rate. A use of PMOS Pass-Gate improves the half-select robustness by directly modulating the device strength through bitline voltage. Clocked sequential elements, flip-flops in short, are ubiquitous in todayโ€™s digital systems. The proposed S2CFF is static, single-phase, contention-free, and has the same number of devices as in TGFF. It shows a 40% power reduction as well as robust low-voltage operations in fabricated 45nm SOI test chips. Its simple hold-time path and the 3.4x improvement in 3-sigma hold-time is presented. A new on-chip flip-flop testing harness is also proposed, and measured hold-time variations of flip-flops are presented.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111525/1/yejoong_1.pd

    Doctor of Philosophy

    Get PDF
    dissertationCommunication surpasses computation as the power and performance bottleneck in forthcoming exascale processors. Scaling has made transistors cheap, but on-chip wires have grown more expensive, both in terms of latency as well as energy. Therefore, the need for low energy, high performance interconnects is highly pronounced, especially for long distance communication. In this work, we examine two aspects of the global signaling problem. The first part of the thesis focuses on a high bandwidth asynchronous signaling protocol for long distance communication. Asynchrony among intellectual property (IP) cores on a chip has become necessary in a System on Chip (SoC) environment. Traditional asynchronous handshaking protocol suffers from loss of throughput due to the added latency of sending the acknowledge signal back to the sender. We demonstrate a method that supports end-to-end communication across links with arbitrarily large latency, without limiting the bandwidth, so long as line variation can be reliably controlled. We also evaluate the energy and latency improvements as a result of the design choices made available by this protocol. The use of transmission lines as a physical interconnect medium shows promise for deep submicron technologies. In our evaluations, we notice a lower energy footprint, as well as vastly reduced wire latency for transmission line interconnects. We approach this problem from two sides. Using field solvers, we investigate the physical design choices to determine the optimal way to implement these lines for a given back-end-of-line (BEOL) stack. We also approach the problem from a system designer's viewpoint, looking at ways to optimize the lines for different performance targets. This work analyzes the advantages and pitfalls of implementing asynchronous channel protocols for communication over long distances. Finally, the innovations resulting from this work are applied to a network-on-chip design example and the resulting power-performance benefits are reported

    A sub-mW IoT-endnode for always-on visual monitoring and smart triggering

    Full text link
    This work presents a fully-programmable Internet of Things (IoT) visual sensing node that targets sub-mW power consumption in always-on monitoring scenarios. The system features a spatial-contrast 128x64128\mathrm{x}64 binary pixel imager with focal-plane processing. The sensor, when working at its lowest power mode (10ฮผW10\mu W at 10 fps), provides as output the number of changed pixels. Based on this information, a dedicated camera interface, implemented on a low-power FPGA, wakes up an ultra-low-power parallel processing unit to extract context-aware visual information. We evaluate the smart sensor on three always-on visual triggering application scenarios. Triggering accuracy comparable to RGB image sensors is achieved at nominal lighting conditions, while consuming an average power between 193ฮผW193\mu W and 277ฮผW277\mu W, depending on context activity. The digital sub-system is extremely flexible, thanks to a fully-programmable digital signal processing engine, but still achieves 19x lower power consumption compared to MCU-based cameras with significantly lower on-board computing capabilities.Comment: 11 pages, 9 figures, submitteted to IEEE IoT Journa
    • โ€ฆ
    corecore