10,344 research outputs found

    Vector processing-aware advanced clock-gating techniques for low-power fused multiply-add

    Get PDF
    The need for power efficiency is driving a rethink of design decisions in processor architectures. While vector processors succeeded in the high-performance market in the past, they need a retailoring for the mobile market that they are entering now. Floating-point (FP) fused multiply-add (FMA), being a functional unit with high power consumption, deserves special attention. Although clock gating is a well-known method to reduce switching power in synchronous designs, there are unexplored opportunities for its application to vector processors, especially when considering active operating mode. In this research, we comprehensively identify, propose, and evaluate the most suitable clock-gating techniques for vector FMA units (VFUs). These techniques ensure power savings without jeopardizing the timing. We evaluate the proposed techniques using both synthetic and โ€œreal-worldโ€ application-based benchmarking. Using vector masking and vector multilane-aware clock gating, we report power reductions of up to 52%, assuming active VFU operating at the peak performance. Among other findings, we observe that vector instruction-based clock-gating techniques achieve power savings for all vector FP instructions. Finally, when evaluating all techniques together, using โ€œreal-worldโ€ benchmarking, the power reductions are up to 80%. Additionally, in accordance with processor design trends, we perform this research in a fully parameterizable and automated fashion.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA 321253 and is supported in part by the European Union (FEDER funds) under contract TTIN2015-65316-P. The work of I. Ratkovic was supported by a FPU research grant from the Spanish MECD.Peer ReviewedPostprint (author's final draft

    A software controlled voltage tuning system using multi-purpose ring oscillators

    Full text link
    This paper presents a novel software driven voltage tuning method that utilises multi-purpose Ring Oscillators (ROs) to provide process variation and environment sensitive energy reductions. The proposed technique enables voltage tuning based on the observed frequency of the ROs, taken as a representation of the device speed and used to estimate a safe minimum operating voltage at a given core frequency. A conservative linear relationship between RO frequency and silicon speed is used to approximate the critical path of the processor. Using a multi-purpose RO not specifically implemented for critical path characterisation is a unique approach to voltage tuning. The parameters governing the relationship between RO and silicon speed are obtained through the testing of a sample of processors from different wafer regions. These parameters can then be used on all devices of that model. The tuning method and software control framework is demonstrated on a sample of XMOS XS1-U8A-64 embedded microprocessors, yielding a dynamic power saving of up to 25% with no performance reduction and no negative impact on the real-time constraints of the embedded software running on the processor

    A novel clock gating approach for the design of low-power linear feedback shift register

    Get PDF
    This paper presents an efficient solution to reduce the power consumption of the popular linear feedback shift register by exploiting the gated clock approach. The power reduction with respect to other gated clock schemes is obtained by an efficient implementation of the logic gates and properly reducing the number of XOR gates in the feedback network. Transistor level simulations are performed by using standard cells in a 28-nm FD-SOI CMOS technology and a 300-MHz clock. Simulation results show a power reduction with respect to traditional implementations, which reaches values higher than 30%

    YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

    Get PDF
    Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy. The computational effort of today's CNNs requires power-hungry parallel processors or GP-GPUs. Recent developments in CNN accelerators for system-on-chip integration have reduced energy consumption significantly. Unfortunately, even these highly optimized devices are above the power envelope imposed by mobile and deeply embedded applications and face hard limitations caused by CNN weight I/O and storage. This prevents the adoption of CNNs in future ultra-low power Internet of Things end-nodes for near-sensor analytics. Recent algorithmic and theoretical advancements enable competitive classification accuracy even when limiting CNNs to binary (+1/-1) weights during training. These new findings bring major optimization opportunities in the arithmetic core by removing the need for expensive multiplications, as well as reducing I/O bandwidth and storage. In this work, we present an accelerator optimized for binary-weight CNNs that achieves 1510 GOp/s at 1.2 V on a core area of only 1.33 MGE (Million Gate Equivalent) or 0.19 mm2^2 and with a power dissipation of 895 {\mu}W in UMC 65 nm technology at 0.6 V. Our accelerator significantly outperforms the state-of-the-art in terms of energy and area efficiency achieving 61.2 TOp/s/[email protected] V and 1135 GOp/s/[email protected] V, respectively

    ๋น„์šฉ ํšจ์œจ์ ์ธ ํด๋Ÿญ ๋ฐ ํŒŒ์›Œ ๊ฒŒ์ดํŒ… ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€,2020. 2. ๊น€ํƒœํ™˜.์ €์ „๋ ฅ ์„ค๊ณ„๋Š” ์ตœ์‹  ์‹œ์Šคํ…œ-์˜จ-์นฉ (SoCs) ์„ค๊ณ„์—์„œ ๋งค์šฐ ์ค‘์š”ํ•œ ์š”์†Œ ์ค‘์˜ ํ•˜๋‚˜์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋™์  ๋ฐ ์ •์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ์ €์ „๋ ฅ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก ์— ๋Œ€ํ•ด ๋…ผํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ ๋น„์šฉ ํšจ์œจ์ ์ธ ์ €์ „๋ ฅ ์„ค๊ณ„๋ฅผ ์œ„ํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ์ƒˆ๋กœ์šด ๊ธฐ์ˆ ์„ ์ œ์•ˆํ•œ๋‹ค. ์šฐ์„  ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋™์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ…์€ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ• ์ค‘์˜ ํ•˜๋‚˜์ด๋‹ค. ํ•˜์ง€๋งŒ ์ด ๋ฐฉ๋ฒ•์€ ๋” ๋งŽ์€ ํ”Œ๋ฆฝ-ํ”Œ๋ž์— ๋Œ€ํ•ด ์ ์šฉํ• ์ˆ˜๋ก ํด๋Ÿญ ๊ฒŒ์ดํŒ…์— ํ•„์š”ํ•œ ๋ถ€๊ฐ€ ํšŒ๋กœ๊ฐ€ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ทผ๋ณธ์ ์ธ ํ•œ๊ณ„๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ ๊ธฐ์กด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์— ํ•„์š”ํ•œ ํšŒ๋กœ ์ž์›์„ ๋ถ„์„ํ•˜์—ฌ ํ•ด๋‹น ๋ฐฉ๋ฒ•์˜ ๋น„ํšจ์œจ์„ฑ์„ ๋ณด์ด๊ณ , ๊ธฐ์กด ๋ฐฉ๋ฒ•์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ฒ€์ถœ์— ํ•„์ˆ˜์ ์ด์ง€๋งŒ ๊ณ ๋น„์šฉ์˜ XOR ๊ฒŒ์ดํŠธ๋ฅผ ์™„๋ฒฝํžˆ ์ œ๊ฑฐํ•œ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ƒํƒœ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ…'์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ ์ œ์•ˆ๋œ XOR ๊ฒŒ์ดํŠธ๊ฐ€ ํ•„์š” ์—†๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์„ ์œ„ํ•œ ๋ถ€๊ฐ€ ํšŒ๋กœ๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ๋‹ค์–‘ํ•œ ํƒ€์ด๋ฐ ๋ถ„์„์„ ํ†ตํ•˜์—ฌ ํ•ด๋‹น ํšŒ๋กœ๊ฐ€ ์•ˆ์ •์ ์œผ๋กœ ์ ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ์„ธ ๋ฒˆ์งธ๋กœ ํšŒ๋กœ์˜ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์ƒํƒœ ํ”„๋กœํŒŒ์ผ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ์ œ์•ˆ๋œ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ•์„ ๊ธฐ์กด ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๊ธฐ๋ฒ•๊ณผ ์™„๋ฒฝํ•˜๊ฒŒ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋Š” ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์—ฌ๋Ÿฌ ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๊ธฐ์กด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํ† ๊ธ€ ๊ธฐ๋ฐ˜ ํด๋Ÿญ ๊ฒŒ์ดํŒ… ๋ฐฉ๋ฒ•์ด ์ „๋ ฅ ์†Œ๋น„ ์ ˆ๊ฐ ๊ธฐํšŒ๋ฅผ ๋†“์น˜๋Š” ๋ฐ˜๋ฉด ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๋ชจ๋“  ํƒ€์ด๋ฐ ์ œ์•ฝ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋ฉด์„œ ์ „๋ ฅ ์†Œ๋น„ ๊ฐ์†Œ์— ๋งค์šฐ ํšจ๊ณผ์ ์ž„์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‹ค์Œ์œผ๋กœ ์ •์  ์ „๋ ฅ ์†Œ๋น„๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•œ ๋ฐฉ์•ˆ์œผ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด ํŒŒ์›Œ ๊ฒŒ์ดํŠธ ํšŒ๋กœ์˜ ์ƒํƒœ ๋ณด์กด์šฉ ์ €์žฅ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฐฉ๋ฒ•๋“ค์ด ์ง€๋‹ˆ๊ณ  ์žˆ๋Š” ๋‘ ๊ฐ€์ง€ ์ค‘์š”ํ•œ ํ•œ๊ณ„๋“ค์„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ค‘์š”ํ•œ ํ•œ๊ณ„๋“ค์ด๋ž€ ์ฒซ ๋ฒˆ์งธ๋กœ ๋‹ค์ค‘-๋น„ํŠธ ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์˜ ๋ฌด๋ถ„๋ณ„ํ•œ ์‚ฌ์šฉ์œผ๋กœ ์ธํ•œ ๊ธด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์ด๋ฉฐ, ๋‘ ๋ฒˆ์งธ๋กœ ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ๋˜๋จน์ž„ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์˜ ์ตœ์ ํ™” ๋ถˆ๊ฐ€๋Šฅ์„ฑ์ด๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์—์„œ๋Š” ์ƒํƒœ ๋ณด์กด์„ ์œ„ํ•œ ์ €์žฅ ๊ณต๊ฐ„์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ธด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์ด ํ•„์ˆ˜์ ์ด์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋˜๋จน์ž„ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ํ”Œ๋ฆฝ-ํ”Œ๋ž์€ ์ตœ์ ํ™”ํ•  ์ˆ˜ ์—†๋Š” ๋Œ€์ƒ์œผ๋กœ ๋‹ค๋ฃจ์–ด์กŒ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ผ๋ฐ˜์ ์œผ๋กœ ํ•˜๋“œ์›จ์–ด ๊ธฐ์ˆ  ์–ธ์–ด(HDL)๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋˜๋Š” ๋˜๋จน์ž„ ๋ฃจํ”„๋ฅผ ์ง€๋‹Œ ํ”Œ๋ฆฝ-ํ”Œ๋ž์€ ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ์„ ์ •๋„๋กœ ์ ์€ ์–‘์ด ์•„๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ํ•œ๊ณ„๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ตœ๋Œ€ 2 ๋น„ํŠธ์˜ ๋‹ค์ค‘-๋น„ํŠธ ์ƒํƒœ ๋ณด์กด ํ”Œ๋ฆฝ-ํ”Œ๋ž์„ ์‚ฌ์šฉํ•˜์—ฌ ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋‘ ํด๋Ÿญ ์‚ฌ์ดํด๋กœ ์ œํ•œํ•˜๋ฉด์„œ๋„ ์ƒํƒœ ๋ณด์กด์„ ์œ„ํ•œ ์ €์žฅ ๊ณต๊ฐ„์„ ํšจ์œจ์ ์œผ๋กœ ์ ˆ์•ฝํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‘ ๋ฒˆ์งธ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋˜๋จน์ž„ ๋ฃจํ”„๋ฅผ ์ง€๋‹Œ ํ”Œ๋ฆฝ-ํ”Œ๋ž์ด ํฌํ•จ๋œ ๋‘ ํ”Œ๋ฆฝ-ํ”Œ๋ž ์Œ์˜ ์ƒํƒœ๋ฅผ ๋ณต์›ํ•  ์ˆ˜ ์žˆ๋Š” 2๋‹จ ์ƒํƒœ ๋ณด์กด ์ œ์–ด ๋ฐฉ์•ˆ์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ ์ฃผ์–ด์ง„ ํšŒ๋กœ์—์„œ ์ถฉ๋Œ์—†์ด ๋™์‹œ์— ์กด์žฌํ•  ์ˆ˜ ์žˆ๋Š” ํ”Œ๋ฆฝ-ํ”Œ๋ž ์Œ์„ ์ตœ๋Œ€๋กœ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด ๋…๋ฆฝ ์ง‘ํ•ฉ ๋ฌธ์ œ(independent set problem)๊ธฐ๋ฐ˜์˜ ์—ฐ์‚ฐ๋ฒ•๋„ ์ œ์•ˆํ•œ๋‹ค. ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ์›จ์ดํฌ์—… ์ง€์—ฐ ์‹œ๊ฐ„์„ ๋‘ ํด๋Ÿญ ์‚ฌ์ดํด๋กœ ์ œํ•œํ•˜๋ฉด์„œ๋„ ์ƒํƒœ ๋ณด์กด์— ํ•„์š”ํ•œ ์ €์žฅ ๊ณต๊ฐ„๊ณผ ํŒŒ์›Œ๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๋Š”๋ฐ ๋งค์šฐ ํšจ๊ณผ์ ์ž„์„ ๋ณด์—ฌ์ค€๋‹ค.Low power design is of great importance in modern system-on-chips (SoCs). This dissertation studies on low power design methodologies for saving dynamic and static power consumption. Precisely, we unveil two novel techniques of cost effective low power design. Firstly, we propose a novel clock gating method for reducing the dynamic power consumption. Flip-flop's input data toggling based clock gating is one of the most commonly used clock gating methods, in which one critical and inherent limitation is the sharp increase of gating logic as more flip-flops are involved in gating. In this dissertation, we propose a new clock gating method to overcome this limitation. Specifically, (1) we analyze the resources of gating logic in the input data toggling based clock gating, from which an ineffectiveness in resource utilization is observed and we propose a new clock gating technique called flip-flop state driven clock gating which completely eliminates the essential and expensive component of XOR gates for detecting input toggling of flip-flops; (2) we provide the supporting logic circuitry of our proposed XOR-free clock gating, confirming its safe applicability through a comprehensive timing analysis; (3) we propose, based on the flip-flops' state profile, a clock gating methodology that seamlessly combines our flip-flop state based clock gating with the toggling based clock gating. Through experiments with benchmark circuits, it is confirmed that our clock gating method is very effective in reducing power, which otherwise the toggling based clock gating shall miss the power saving opportunity, while meeting all timing constraints. Secondly, for reducing the static power consumption, we solve two critical limitations of the conventional approaches to the allocation of state retention storage for power gated circuits. Those are (1) the long wakeup delay caused by the senseless use of multi-bit retention flip-flops (MBRFFs) and (2) the inability to optimize retention flip-flops for the flip-flops with mux-feedback loop. It should be noted that the conventional approaches have regarded the long wakeup delay as an inevitable consequence of maximizing the reduction of total storage size for state retention while they have treated the flip-flops with mux-feedback loop (called self-loop flip-flop) as nonoptimizable component, but practically, the self-loop flip-flops synthesized from hardware description language (HDL) code are not far from a small amount and thus, can in no way be negligible. More precisely, for solving (1), we show that the use of MBRFFs with up to two bits, consequently, constraining the wakeup delay to no more than two clock cycles, is enough to maintain the high reduction of total retention storage and for solving (2), we devise a 2-phase retention control mechanism for a pair of flip-flops, one of which has self-loop, by which just a single retention bit can be used to restore state of the two flip-flops, and propose an independent set based algorithm for maximally extracting the non-conflict pairs from circuits. Through experiments with benchmark circuits, it is shown that our proposed method is very effective against reducing the state retention storage and the power consumption compared with the existing best MBRFF allocation while the wakeup delay is strictly limited to two clock cycles.1 INTRODUCTION 1 1.1 Clock Gating 1 1.2 Power Gating and State Retention 3 1.3 Multi-bit Retention Registers 4 1.4 Contributions of This Dissertation 6 2 FLIP-FLOP STATE DRIVEN CLOCK GATING: CONCEPT, DESIGN, AND METHODOLOGY 9 2.1 Motivations 9 2.1.1 Toggling based Clock Gating 9 2.1.2 Area and Power by Clock Gating 10 2.2 The Proposed Clock Gating 13 2.2.1 Concept of Flip-flop State Driven Clock Gating 13 2.2.2 Design of Gating Logic Circuitry 17 2.2.3 Integrated Clock Gating Methodology 22 2.2.4 Cost Formulation 23 2.3 Experiments 25 2.3.1 Experimental Setup 25 2.3.2 Experimental Results 26 3 ALGORITHM AND DESIGN OPTIMIZATION OF ALLOCATING MULTI-BIT RETENTION FLIP-FLOPS FOR POWER GATED CIRCUITS 32 3.1 Motivations 32 3.1.1 Flip-flops with Mux-feedback Loop 32 3.1.2 Impact of Wakeup Delay 37 3.2 The Proposed Allocation Algorithm 39 3.3 Design of Multi-Bit Retention Flip-Flop and Multi-Bit Extension 48 3.3.1 Multi-Bit Retention Flip-Flop 48 3.3.2 Multi-Bit Flip-Flop Extension 52 3.4 Experiments 54 3.4.1 Experimental Setup 54 3.4.2 Experimental Results 57 4 CONCLUSIONS 65 4.1 Flip-flop State Driven Clock Gating: Concept, Design, and Methodology 65 4.2 Algorithm and Design Optimization of Allocating Multi-bit Retention Flip-flops for Power Gated Circuits 66 Abstract (In Korean) 71Docto

    Design of Adiabatic MTJ-CMOS Hybrid Circuits

    Full text link
    Low-power designs are a necessity with the increasing demand of portable devices which are battery operated. In many of such devices the operational speed is not as important as battery life. Logic-in-memory structures using nano-devices and adiabatic designs are two methods to reduce the static and dynamic power consumption respectively. Magnetic tunnel junction (MTJ) is an emerging technology which has many advantages when used in logic-in-memory structures in conjunction with CMOS. In this paper, we introduce a novel adiabatic hybrid MTJ/CMOS structure which is used to design AND/NAND, XOR/XNOR and 1-bit full adder circuits. We simulate the designs using HSPICE with 32nm CMOS technology and compared it with a non-adiabatic hybrid MTJ/CMOS circuits. The proposed adiabatic MTJ/CMOS full adder design has more than 7 times lower power consumtion compared to the previous MTJ/CMOS full adder
    • โ€ฆ
    corecore