343 research outputs found

    Low Power Digital Design using Asynchronous Logic

    Get PDF
    This thesis summarizes research undertaken at San Josรฉ State University between January 2009 and May 2011, which introduces a new method of achieving low power by reducing the dependency of the clock signal in the design. A clock signal consumes power even when the circuit is idle, but asynchronous circuits by default move into the idle state and involve no transition in the circuit during that state. In addition, in an active system, only the subsystem that is in use dissipates power. This work mainly focused on obtaining low power by implementing asynchronous logic. The work also studied the measure of power consumption using asynchronous logic by designing a simple Display Controller. The Display Controller was designed using Verilog HDL and synthesized using Synopsys Design Compiler. The work also studied the tradeโ€“offs in power, area, and design complexity in asynchronous design. The power consumed by the synchronous and asynchronous display controllers was measured, and the asynchronous design consumed about 17% less power than its synchronous counterpart. The area of the asynchronous design was twice that of the synchronous one. Power can be reduced by reducing the dependency of the clock signal in the design by choosing asynchronous logic

    ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐ ํ”Œ๋ฆฝ ํ”Œ๋กญ ๋™์‹œ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•œ ์„ค๊ณ„ ๋ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2019. 2. ๊น€ํƒœํ™˜.๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ‘œ์ค€ ์…€์—์„œ๋ถ€ํ„ฐ ๋ฐฐ์น˜ ๋‹จ๊ณ„์— ์ด๋ฅด๋Š” ๋‹ค์–‘ํ•œ ์„ค๊ณ„๋‹จ์—์—์„œ ์นฉ์˜ ๋™์  ์ „๋ ฅ์„ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ์šฐ์„  ๋ฐ์ดํ„ฐ ๊ตฌ๋™ํ˜• (์ฆ‰, ํ† ๊ธ€๋ง ๊ธฐ๋ฐ˜) ํด๋Ÿญ ๊ฒŒ์ดํŒ…์ด ์ข…๋ž˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ•๋“ค์—์„œ ๊ฒฐ์ฝ” ๋‹ค๋ฃจ์–ด์ง€์ง€ ์•Š์•˜๋˜ ํ”Œ๋ฆฝ ํ”Œ ๋กญ์˜ ํ•ฉ์„ฑ๊ณผ ๋ฐ€์ ‘ํ•˜๊ฒŒ ํ†ตํ•ฉ๋  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์—ฐ๊ตฌํ•œ๋‹ค. ์šฐ๋ฆฌ์˜ ๊ด€์ธก์˜ ํ•ต์‹ฌ์€ ํ”Œ๋ฆฝ ํ”Œ๋กญ ์…€์˜ ์ผ๋ถ€ ๋‚ด๋ถ€ ๋ถ€ํ’ˆ์ด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ์ธ์—์ด๋ธ” ์‹ ํ˜ธ๋ฅผ ์ƒ์„ฑ ํ•˜๊ธฐ ์œ„ํ•ด ์žฌ์‚ฌ์šฉ ๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ eXOR-FF ๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์ƒˆ๋กญ๊ฒŒ ์ตœ์ ํ™”๋œ ํ”Œ๋ฆฝ ํ”Œ๋กญ ๋ฐฐ์„  ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ตฌ์กฐ์—์„œ๋Š” ๋งค ํด๋Ÿญ ์ฃผ๊ธฐ๋งˆ๋‹ค ๋‚ด๋ถ€ ๋กœ์ง์„ ์žฌ์‚ฌ์šฉ ํ•˜์—ฌ ํด๋Ÿญ ๊ฒŒ์ดํŒ…์„ ํ†ตํ•ด ํ”Œ๋ฆฝ ํ”Œ๋กญ์„ ํ™œ์„ฑํ™”ํ• ์ง€ ๋˜๋Š” ๋น„ํ™œ์„ฑํ™”ํ• ์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ์Œ์˜ ํ”Œ๋ฆฝ ํ”Œ๋กญ ๋ฐ ํ† ๊ธ€๋ฆด ๊ฐ์ง€ ๋กœ์ง์—์„œ์˜ ์˜์—ญ์„ ์ ˆ์•ฝํ•จ์— ๋”ฐ๋ผ์„œ ๋ˆ„์„ค ๋ฐ ๋™์  ์ „๋ ฅ์˜ ์ ˆ์ „ ํšจ๊ณผ๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ, ๋‘ ๊ฐ€์ง€๊ณ ์œ ํ•œ ์žฅ์ ์„ ์ œ๊ณตํ•˜๋Š” ๋ฐฐ์น˜/ํƒ€์ด๋ฐ ์ธ์‹ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ํƒ์ƒ‰์— ๋Œ€ํ•œ ํฌ๊ด„์ ์ธ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ฐฉ ๋ฒ•๋ก ์€ eXOR-FF ์˜ ์ด์ ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ณ , ์ „๋ ฅ ์†Œ๋น„ ๋ฐ ํƒ€์ด๋ฐ ์˜ํ–ฅ์˜ ๋ถ„ํ•ด์— ๋Œ€ํ•œ ์ •๋ฐ€ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ํ‹€๋Ÿญ ๊ฒŒ์ดํŒ… ์ฐธ์ƒ‰์˜ ํ•ต์‹ฌ ์—”์ง„์„ ๋น„์šฉ๊ธฐ๋Šฅ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š”๋ฐ ๊ฐ€์žฅ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ISCAS89, ITC89, ITC99 ๋ฐ IWLS 2005์˜ ๋ฒค์น˜ ๋งˆํฌ ํšŒ๋กœ๋ฅผ ์‚ฌ์šฉ ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ์ œ์•ˆ ๋œ ๋ฐฉ๋ฒ•์ด ์ด์ „์˜ ๋ฐ์ดํ„ฐ ๊ตฌ๋™ ํด๋ก ๊ฒŒ์ดํŒ… ๋ฐฉ์‹๊ณผ ๋น„๊ตํ•˜์—ฌ ์ด ์ „๋ ฅ์„ 5.6 % ๋ฐ ๋ฉด์ ์œผ๋กœ 5.3 % ์ค„์ผ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ ์ฃผ์—ˆ๋‹ค.In this paper, we introduce dynamic power optimization techniques applicable for various design stage from standard cell to placement stage. This work firstly investi๏ฟฝgates the problem of how designing data-driven (i.e., toggling based) clock gating can be closely integrated with the synthesis of flip-flops, which has never been addressed in the prior clock gating works. Our key observation is that some internal part of a flip-flop cell can be reused to generate its clock gating enable signal. Based on this, we propose a newly optimized flip-flop wiring structure, called eXOR-FF, in which an internal logic can be reused for every clock cycle to decide if the flip-flop is to be activated or inactivated through clock gating, thereby achieving area saving (thus, leakage as well as dynamic power saving) on every pair of flip-flop and its toggling detection logic. Then, we propose a comprehensive methodology of placement/timing๏ฟฝaware clock gating exploration that provides two unique strengths: best suited for max๏ฟฝimally exploiting the benefit of eXOR-FFs and precise analyses on the decomposition of power consumptions and timing impact, and translating them into cost functions in core engine of clock gating exploration. Through experiments with benchmark circuits in ISCAS89, ITC89, ITC99 and IWLS 2005, it is shown that our proposed method is able to reduce the total power by 5.6% and total cell area by 5.3% compared with the previous data-driven clock gating method in [1].Abstract Contents List of Tables List of Figures 1 Introduction 1.1 Power Consumption in CMOS Digital Design 1.2 Low Power Design Methodologies 1.3 Contribution of This Thesis 2 Preliminary and Motivations 6 2.1 Background 2.2 Observation on Area and Power Saving 2.3 Observation on Timing Impact 3 Redesign of Flip-flops Specialized for Clock Gating 3.1 Observation on Area Impact 4 Placement-aware Clock Gating Methodology Utilizing eXOR-FF Cells 4.1 Overall Design Flow 4.2 Cost Formulation for Conventional Clock Gating 4.3 Cost Formulation for Our Clock Gating using eXOR-FFs 5 Experiments 5.1 Experimental Setup 5.2 Experimental Results 5.3 Comparing with Industry Algorithm 6 Conclusion Abstract (In Korean)Maste

    Research on low power technology by AC power supply circuits

    Get PDF
    ๅˆถๅบฆ:ๆ–ฐ ; ๅ ฑๅ‘Š็•ชๅท:็”ฒ3692ๅท ; ๅญฆไฝใฎ็จฎ้กž:ๅšๅฃซ(ๅทฅๅญฆ) ; ๆŽˆไธŽๅนดๆœˆๆ—ฅ:2012/9/15 ; ๆ—ฉๅคงๅญฆไฝ่จ˜็•ชๅท:ๆ–ฐ6060Waseda Universit

    High-level synthesis of dataflow programs for heterogeneous platforms:design flow tools and design space exploration

    Get PDF
    The growing complexity of digital signal processing applications implemented in programmable logic and embedded processors make a compelling case the use of high-level methodologies for their design and implementation. Past research has shown that for complex systems, raising the level of abstraction does not necessarily come at a cost in terms of performance or resource requirements. As a matter of fact, high-level synthesis tools supporting such a high abstraction often rival and on occasion improve low-level design. In spite of these successes, high-level synthesis still relies on programs being written with the target and often the synthesis process, in mind. In other words, imperative languages such as C or C++, most used languages for high-level synthesis, are either modified or a constrained subset is used to make parallelism explicit. In addition, a proper behavioral description that permits the unification for hardware and software design is still an elusive goal for heterogeneous platforms. A promising behavioral description capable of expressing both sequential and parallel application is RVC-CAL. RVC-CAL is a dataflow programming language that permits design abstraction, modularity, and portability. The objective of this thesis is to provide a high-level synthesis solution for RVC-CAL dataflow programs and provide an RVC-CAL design flow for heterogeneous platforms. The main contributions of this thesis are: a high-level synthesis infrastructure that supports the full specification of RVC-CAL, an action selection strategy for supporting parallel read and writes of list of tokens in hardware synthesis, a dynamic fine-grain profiling for synthesized dataflow programs, an iterative design space exploration framework that permits the performance estimation, analysis, and optimization of heterogeneous platforms, and finally a clock gating strategy that reduces the dynamic power consumption. Experimental results on all stages of the provided design flow, demonstrate the capabilities of the tools for high-level synthesis, software hardware Co-Design, design space exploration, and power optimization for reconfigurable hardware. Consequently, this work proves the viability of complex systems design and implementation using dataflow programming, not only for system-level simulation but real heterogeneous implementations

    ๋น„์šฉ ํšจ์œจ์ ์ธ ํด๋Ÿญ ๋ฐ ํŒŒ์›Œ ๊ฒŒ์ดํŒ… ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€,2020. 2. ๊น€ํƒœํ™˜.์ €์ „๋ ฅ ์„ค๊ณ„๋Š” ์ตœ์‹  ์‹œ์Šคํ…œ-์˜จ-์นฉ (SoCs) ์„ค๊ณ„์—์„œ ๋งค์šฐ ์ค‘์š”ํ•œ ์š”์†Œ ์ค‘์˜ ํ•˜๋‚˜์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋™์  ๋ฐ ์ •์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์ €์ „๋ ฅ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก ์— ๋Œ€ํ•ด ๋…ผํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ๋น„์šฉ ํšจ์œจ์ ์ธ ์ €์ „๋ ฅ ์„ค๊ณ„๋ฅผ ์œ„ํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ์ƒˆ๋กœ์šด ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•œ๋‹ค. ์šฐ์„  ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋™์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ…์€ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ• ์ค‘์˜ ํ•˜๋‚˜์ด๋‹ค. ํ•˜์ง€๋งŒ ์ด ๋ฐฉ๋ฒ•์€ ๋” ๋งŽ์€ ํ”Œ๋ฆฝ-ํ”Œ๋ž์— ๋Œ€ํ•ด ์ ์šฉํ• ์ˆ˜๋ก ํด๋Ÿญ ๊ฒŒ์ดํŒ…์— ํ•„์š”ํ•œ ๋ถ€๊ฐ€ ํšŒ๋กœ๊ฐ€ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ทผ๋ณธ์ ์ธ ํ•œ๊ณ„๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ ๊ธฐ์กด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์— ํ•„์š”ํ•œ ํšŒ๋กœ ์ž์›์„ ๋ถ„์„ํ•˜์—ฌ ํ•ด๋‹น ๋ฐฉ๋ฒ•์˜ ๋น„ํšจ์œจ์„ฑ์„ ๋ณด์ด๊ณ , ๊ธฐ์กด ๋ฐฉ๋ฒ•์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ฒ€์ถœ์— ํ•„์ˆ˜์ ์ด์ง€๋งŒ ๊ณ ๋น„์šฉ์˜ XOR ๊ฒŒ์ดํŠธ๋ฅผ ์™„๋ฒฝํžˆ ์ œ๊ฑฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ƒํƒœ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ…'์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ ์ œ์•ˆ๋œ XOR ๊ฒŒ์ดํŠธ๊ฐ€ ํ•„์š” ์—†๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์œ„ํ•œ ๋ถ€๊ฐ€ ํšŒ๋กœ๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ๋‹ค์–‘ํ•œ ํƒ€์ด๋ฐ ๋ถ„์„์„ ํ†ตํ•˜์—ฌ ํ•ด๋‹น ํšŒ๋กœ๊ฐ€ ์•ˆ์ •์ ์œผ๋กœ ์ ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ์„ธ ๋ฒˆ์งธ๋กœ ํšŒ๋กœ์˜ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ƒํƒœ ํ”„๋กœํŒŒ์ผ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ์ œ์•ˆ๋œ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ•์„ ๊ธฐ์กด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ•๊ณผ ์™„๋ฒฝํ•˜๊ฒŒ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์—ฌ๋Ÿฌ ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๊ธฐ์กด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์ด ์ „๋ ฅ ์†Œ๋น„ ์ ˆ๊ฐ ๊ธฐํšŒ๋ฅผ ๋†“์น˜๋Š” ๋ฐ˜๋ฉด ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๋ชจ๋“  ํƒ€์ด๋ฐ ์ œ์•ฝ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋ฉด์„œ ์ „๋ ฅ ์†Œ๋น„ ๊ฐ์†Œ์— ๋งค์šฐ ํšจ๊ณผ์ ์ž„์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‹ค์Œ์œผ๋กœ ์ •์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•œ ๋ฐฉ์•ˆ์œผ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด ํŒŒ์›Œ ๊ฒŒ์ดํŠธ ํšŒ๋กœ์˜ ์ƒํƒœ ๋ณด์กด์šฉ ์ €์žฅ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฐฉ๋ฒ•๋“ค์ด ์ง€๋‹ˆ๊ณ  ์žˆ๋Š” ๋‘ ๊ฐ€์ง€ ์ค‘์š”ํ•œ ํ•œ๊ณ„๋“ค์„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ค‘์š”ํ•œ ํ•œ๊ณ„๋“ค์ด๋ž€ ์ฒซ ๋ฒˆ์งธ๋กœ ๋‹ค์ค‘-๋น„ํŠธ ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์˜ ๋ฌด๋ถ„๋ณ„ํ•œ ์‚ฌ์šฉ์œผ๋กœ ์ธํ•œ ๊ธด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์ด๋ฉฐ, ๋‘ ๋ฒˆ์งธ๋กœ ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ๋˜๋จน์ž„ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์˜ ์ตœ์ ํ™” ๋ถˆ๊ฐ€๋Šฅ์„ฑ์ด๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์—์„œ๋Š” ์ƒํƒœ ๋ณด์กด์„ ์œ„ํ•œ ์ €์žฅ ๊ณต๊ฐ„์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ธด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์ด ํ•„์ˆ˜์ ์ด์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋˜๋จน์ž„ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ํ”Œ๋ฆฝ-ํ”Œ๋ž์€ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์—†๋Š” ๋Œ€์ƒ์œผ๋กœ ๋‹ค๋ฃจ์–ด์กŒ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ผ๋ฐ˜์ ์œผ๋กœ ํ•˜๋“œ์›จ์–ด ๊ธฐ์ˆ  ์–ธ์–ด(HDL)๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋˜๋Š” ๋˜๋จน์ž„ ๋ฃจํ”„๋ฅผ ์ง€๋‹Œ ํ”Œ๋ฆฝ-ํ”Œ๋ž์€ ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ์„ ์ •๋„๋กœ ์ ์€ ์–‘์ด ์•„๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ตœ๋Œ€ 2 ๋น„ํŠธ์˜ ๋‹ค์ค‘-๋น„ํŠธ ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์„ ์‚ฌ์šฉํ•˜์—ฌ ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋‘ ํด๋Ÿญ ์‚ฌ์ดํด๋กœ ์ œํ•œํ•˜๋ฉด์„œ๋„ ์ƒํƒœ ๋ณด์กด์„ ์œ„ํ•œ ์ €์žฅ ๊ณต๊ฐ„์„ ํšจ์œจ์ ์œผ๋กœ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‘ ๋ฒˆ์งธ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋˜๋จน์ž„ ๋ฃจํ”„๋ฅผ ์ง€๋‹Œ ํ”Œ๋ฆฝ-ํ”Œ๋ž์ด ํฌํ•จ๋œ ๋‘ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์Œ์˜ ์ƒํƒœ๋ฅผ ๋ณต์›ํ•  ์ˆ˜ ์žˆ๋Š” 2๋‹จ ์ƒํƒœ ๋ณด์กด ์ œ์–ด ๋ฐฉ์•ˆ์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ ์ฃผ์–ด์ง„ ํšŒ๋กœ์—์„œ ์ถฉ๋Œ์—†์ด ๋™์‹œ์— ์กด์žฌํ•  ์ˆ˜ ์žˆ๋Š” ํ”Œ๋ฆฝ-ํ”Œ๋ž ์Œ์„ ์ตœ๋Œ€๋กœ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด ๋…๋ฆฝ ์ง‘ํ•ฉ ๋ฌธ์ œ(independent set problem)๊ธฐ๋ฐ˜์˜ ์—ฐ์‚ฐ๋ฒ•๋„ ์ œ์•ˆํ•œ๋‹ค. ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋‘ ํด๋Ÿญ ์‚ฌ์ดํด๋กœ ์ œํ•œํ•˜๋ฉด์„œ๋„ ์ƒํƒœ ๋ณด์กด์— ํ•„์š”ํ•œ ์ €์žฅ ๊ณต๊ฐ„๊ณผ ํŒŒ์›Œ๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๋Š”๋ฐ ๋งค์šฐ ํšจ๊ณผ์ ์ž„์„ ๋ณด์—ฌ์ค€๋‹ค.Low power design is of great importance in modern system-on-chips (SoCs). This dissertation studies on low power design methodologies for saving dynamic and static power consumption. Precisely, we unveil two novel techniques of cost effective low power design. Firstly, we propose a novel clock gating method for reducing the dynamic power consumption. Flip-flop's input data toggling based clock gating is one of the most commonly used clock gating methods, in which one critical and inherent limitation is the sharp increase of gating logic as more flip-flops are involved in gating. In this dissertation, we propose a new clock gating method to overcome this limitation. Specifically, (1) we analyze the resources of gating logic in the input data toggling based clock gating, from which an ineffectiveness in resource utilization is observed and we propose a new clock gating technique called flip-flop state driven clock gating which completely eliminates the essential and expensive component of XOR gates for detecting input toggling of flip-flops; (2) we provide the supporting logic circuitry of our proposed XOR-free clock gating, confirming its safe applicability through a comprehensive timing analysis; (3) we propose, based on the flip-flops' state profile, a clock gating methodology that seamlessly combines our flip-flop state based clock gating with the toggling based clock gating. Through experiments with benchmark circuits, it is confirmed that our clock gating method is very effective in reducing power, which otherwise the toggling based clock gating shall miss the power saving opportunity, while meeting all timing constraints. Secondly, for reducing the static power consumption, we solve two critical limitations of the conventional approaches to the allocation of state retention storage for power gated circuits. Those are (1) the long wakeup delay caused by the senseless use of multi-bit retention flip-flops (MBRFFs) and (2) the inability to optimize retention flip-flops for the flip-flops with mux-feedback loop. It should be noted that the conventional approaches have regarded the long wakeup delay as an inevitable consequence of maximizing the reduction of total storage size for state retention while they have treated the flip-flops with mux-feedback loop (called self-loop flip-flop) as nonoptimizable component, but practically, the self-loop flip-flops synthesized from hardware description language (HDL) code are not far from a small amount and thus, can in no way be negligible. More precisely, for solving (1), we show that the use of MBRFFs with up to two bits, consequently, constraining the wakeup delay to no more than two clock cycles, is enough to maintain the high reduction of total retention storage and for solving (2), we devise a 2-phase retention control mechanism for a pair of flip-flops, one of which has self-loop, by which just a single retention bit can be used to restore state of the two flip-flops, and propose an independent set based algorithm for maximally extracting the non-conflict pairs from circuits. Through experiments with benchmark circuits, it is shown that our proposed method is very effective against reducing the state retention storage and the power consumption compared with the existing best MBRFF allocation while the wakeup delay is strictly limited to two clock cycles.1 INTRODUCTION 1 1.1 Clock Gating 1 1.2 Power Gating and State Retention 3 1.3 Multi-bit Retention Registers 4 1.4 Contributions of This Dissertation 6 2 FLIP-FLOP STATE DRIVEN CLOCK GATING: CONCEPT, DESIGN, AND METHODOLOGY 9 2.1 Motivations 9 2.1.1 Toggling based Clock Gating 9 2.1.2 Area and Power by Clock Gating 10 2.2 The Proposed Clock Gating 13 2.2.1 Concept of Flip-flop State Driven Clock Gating 13 2.2.2 Design of Gating Logic Circuitry 17 2.2.3 Integrated Clock Gating Methodology 22 2.2.4 Cost Formulation 23 2.3 Experiments 25 2.3.1 Experimental Setup 25 2.3.2 Experimental Results 26 3 ALGORITHM AND DESIGN OPTIMIZATION OF ALLOCATING MULTI-BIT RETENTION FLIP-FLOPS FOR POWER GATED CIRCUITS 32 3.1 Motivations 32 3.1.1 Flip-flops with Mux-feedback Loop 32 3.1.2 Impact of Wakeup Delay 37 3.2 The Proposed Allocation Algorithm 39 3.3 Design of Multi-Bit Retention Flip-Flop and Multi-Bit Extension 48 3.3.1 Multi-Bit Retention Flip-Flop 48 3.3.2 Multi-Bit Flip-Flop Extension 52 3.4 Experiments 54 3.4.1 Experimental Setup 54 3.4.2 Experimental Results 57 4 CONCLUSIONS 65 4.1 Flip-flop State Driven Clock Gating: Concept, Design, and Methodology 65 4.2 Algorithm and Design Optimization of Allocating Multi-bit Retention Flip-flops for Power Gated Circuits 66 Abstract (In Korean) 71Docto

    Cross-Layer Automated Hardware Design for Accuracy-Configurable Approximate Computing

    Get PDF
    Approximate Computing trades off computation accuracy against performance or energy efficiency. It is a design paradigm that arose in the last decade as an answer to diminishing returns from Dennard\u27s scaling and a shift in the prominent workloads. A range of modern workloads, categorized mainly as recognition, mining, and synthesis, features an inherent tolerance to approximations. Their characteristics, such as redundancies in their input data and robust-to-noise algorithms, allow them to produce outputs of acceptable quality, despite an approximation in some of their computations. Approximate Computing leverages the application tolerance by relaxing the exactness in computation towards primary design goals of increasing performance or improving energy efficiency. Existing techniques span across the abstraction layers of computer systems where cross-layer techniques are shown to offer a larger design space and yield higher savings. Currently, the majority of the existing work aims at meeting a single accuracy. The extent of approximation tolerance, however, significantly varies with a change in input characteristics and applications. In this dissertation, methods and implementations are presented for cross-layer and automated design of accuracy-configurable Approximate Computing to maximally exploit the performance and energy benefits. In particular, this dissertation addresses the following challenges and introduces novel contributions: A main Approximate Computing category in hardware is to scale either voltage or frequency beyond the safe limits for power or performance benefits, respectively. The rationale is that timing errors would be gradual and for an initial range tolerable. This scaling enables a fine-grain accuracy-configurability by varying the timing error occurrence. However, conventional synthesis tools aim at meeting a single delay for all paths within the circuit. Subsequently, with voltage or frequency scaling, either all paths succeed, or a large number of paths fail simultaneously, with a steep increase in error rate and magnitude. This dissertation presents an automated method for minimizing path delays by individually constraining the primary outputs of combinational circuits. As a result, it reduces the number of failing paths and makes the timing errors significantly more gradual, and also rarer and smaller on average. Additionally, it reveals that delays can be significantly reduced towards the least significant bit (LSB) and allows operating at a higher frequency when small operands are computed. Precision scaling, i.e., reducing the representation of data and its accuracy is widely used in multiple abstraction layers in Approximate Computing. Reducing data precision also reduces the transistor toggles, and therefore the dynamic power consumption. Application and architecture level precision scaling results in using only LSBs of the circuit. Arithmetic circuits often have less complexity and logic depth in LSBs compared to most significant bits (MSB). To take advantage of this circuit property, a delay-altering synthesis methodology is proposed. The method finds energy-optimal delay values under configurable precision usage and assigns them to primary outputs used for different precisions. Thereby, it enables dynamic frequency-precision scalable circuits for energy efficiency. Within the hardware architecture, it is possible to instantiate multiple units with the same functionality with different fixed approximation levels, where each block benefits from having fewer transistors and also synthesis relaxations. These blocks can be selected dynamically and thus allow to configure the accuracy during runtime. Instantiating such approximate blocks can be a lower dynamic power but higher area and leakage cost alternative to the current state-of-the-art gating mechanisms which switch off a group of paths in the circuit to reduce the toggling activity. Jointly, instantiating multiple blocks and gating mechanisms produce a large design space of accuracy-configurable hardware, where energy-optimal solutions require a cross-layer search in architecture and circuit levels. To that end, an approximate hardware synthesis methodology is proposed with joint optimizations in architecture and circuit for dynamic accuracy scaling, and thereby it enables energy vs. area trade-offs
    • โ€ฆ
    corecore