663 research outputs found

    ์ •์  ๋žจ ๋ฐ ํŒŒ์›Œ ๊ฒŒ์ดํŠธ ํšŒ๋กœ์— ๋Œ€ํ•œ ์ „์•• ๋ฐ ๋ณด์กด์šฉ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฌธ์ œ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021.8. ๊น€ํƒœํ™˜.์นฉ์˜ ์ €์ „๋ ฅ ๋™์ž‘์€ ์ค‘์š”ํ•œ ๋ฌธ์ œ์ด๋ฉฐ, ๊ณต์ •์ด ๋ฐœ์ „ํ•˜๋ฉด์„œ ๊ทธ ์ค‘์š”์„ฑ์€ ์ ์  ์ปค์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ์นฉ์„ ๊ตฌ์„ฑํ•˜๋Š” ์ •์  ๋žจ(SRAM) ๋ฐ ๋กœ์ง(logic) ๊ฐ๊ฐ์— ๋Œ€ํ•ด์„œ ์ €์ „๋ ฅ์œผ๋กœ ๋™์ž‘์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ๋…ผํ•œ๋‹ค. ์šฐ์„ , ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์นฉ์„ ๋ฌธํ„ฑ ์ „์•• ๊ทผ์ฒ˜์˜ ์ „์••(NTV)์—์„œ ๋™์ž‘์‹œํ‚ค๊ณ ์ž ํ•  ๋•Œ ๋ชจ๋‹ˆํ„ฐ๋ง ํšŒ๋กœ์˜ ์ธก์ •์„ ํ†ตํ•ด ์นฉ ๋‚ด์˜ ๋ชจ๋“  SRAM ๋ธ”๋ก์—์„œ ๋™์ž‘ ์‹คํŒจ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ์ตœ์†Œ ๋™์ž‘ ์ „์••์„ ์ถ”๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์นฉ์„ NTV ์˜์—ญ์—์„œ ๋™์ž‘์‹œํ‚ค๋Š” ๊ฒƒ์€ ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ์ฆ๋Œ€์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๋งค์šฐ ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด์ง€๋งŒ SRAM์˜ ๊ฒฝ์šฐ ๋™์ž‘ ์‹คํŒจ ๋•Œ๋ฌธ์— ๋™์ž‘ ์ „์••์„ ๋‚ฎ์ถ”๊ธฐ ์–ด๋ ต๋‹ค. ํ•˜์ง€๋งŒ ์นฉ๋งˆ๋‹ค ์˜ํ–ฅ์„ ๋ฐ›๋Š” ๊ณต์ • ๋ณ€์ด๊ฐ€ ๋‹ค๋ฅด๋ฏ€๋กœ ์ตœ์†Œ ๋™์ž‘ ์ „์••์€ ์นฉ๋งˆ๋‹ค ๋‹ค๋ฅด๋ฉฐ, ๋ชจ๋‹ˆํ„ฐ๋ง์„ ํ†ตํ•ด ์ด๋ฅผ ์ถ”๋ก ํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค๋ฉด ์นฉ๋ณ„๋กœ SRAM์— ์„œ๋กœ ๋‹ค๋ฅธ ์ „์••์„ ์ธ๊ฐ€ํ•ด ์—๋„ˆ์ง€ ํšจ์œจ์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ณผ์ •์„ ํ†ตํ•ด ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค: (1) ๋””์ž์ธ ์ธํ”„๋ผ ์„ค๊ณ„ ๋‹จ๊ณ„์—์„œ๋Š” SRAM์˜ ์ตœ์†Œ ๋™์ž‘ ์ „์••์„ ์ถ”๋ก ํ•˜๊ณ  ์นฉ ์ƒ์‚ฐ ๋‹จ๊ณ„์—์„œ๋Š” SRAM ๋ชจ๋‹ˆํ„ฐ์˜ ์ธก์ •์„ ํ†ตํ•ด ์ „์••์„ ์ธ๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค; (2) ์นฉ์˜ SRAM ๋น„ํŠธ์…€(bitcell)๊ณผ ์ฃผ๋ณ€ ํšŒ๋กœ๋ฅผ ํฌํ•จํ•œ SRAM ๋ธ”๋ก๋“ค์˜ ๊ณต์ • ๋ณ€์ด๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•  ์ˆ˜ ์žˆ๋Š” SRAM ๋ชจ๋‹ˆํ„ฐ์™€ SRAM ๋ชจ๋‹ˆํ„ฐ์—์„œ ๋ชจ๋‹ˆํ„ฐ๋งํ•  ๋Œ€์ƒ์„ ์ •์˜ํ•œ๋‹ค; (3) SRAM ๋ชจ๋‹ˆํ„ฐ์˜ ์ธก์ •๊ฐ’์„ ์ด์šฉํ•ด ๊ฐ™์€ ์นฉ์— ์กด์žฌํ•˜๋Š” ๋ชจ๋“  SRAM ๋ธ”๋ก์—์„œ ๋ชฉํ‘œ ์‹ ๋ขฐ์ˆ˜์ค€ ๋‚ด์—์„œ ์ฝ๊ธฐ, ์“ฐ๊ธฐ, ๋ฐ ์ ‘๊ทผ ๋™์ž‘ ์‹คํŒจ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ์ตœ์†Œ ๋™์ž‘ ์ „์••์„ ์ถ”๋ก ํ•œ๋‹ค. ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์„ ๋”ฐ๋ผ ์นฉ๋ณ„๋กœ SRAM ๋ธ”๋ก๋“ค์˜ ์ตœ์†Œ ๋™์ž‘ ์ „์••์„ ๋‹ค๋ฅด๊ฒŒ ์ธ๊ฐ€ํ•  ๊ฒฝ์šฐ, ๊ธฐ์กด ๋ฐฉ๋ฒ•๋Œ€๋กœ ๋ชจ๋“  ์นฉ์— ๋™์ผํ•œ ์ „์••์„ ์ธ๊ฐ€ํ•˜๋Š” ๊ฒƒ ๋Œ€๋น„ ์ˆ˜์œจ์€ ๊ฐ™์€ ์ˆ˜์ค€์œผ๋กœ ์œ ์ง€ํ•˜๋ฉด์„œ SRAM ๋น„ํŠธ์…€ ๋ฐฐ์—ด์˜ ์ „๋ ฅ ์†Œ๋ชจ๋ฅผ ๊ฐ์†Œ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํŒŒ์›Œ ๊ฒŒ์ดํŠธ ํšŒ๋กœ์—์„œ ๊ธฐ์กด์˜ ๋ณด์กด์šฉ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฐฉ๋ฒ•๋“ค์ด ์ง€๋‹ˆ๊ณ  ์žˆ๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ๋ˆ„์„ค ์ „๋ ฅ ์†Œ๋ชจ๋ฅผ ๋” ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด์˜ ๋ณด์กด์šฉ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฐฉ๋ฒ•์€ ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ๋ชจ๋“  ํ”Œ๋ฆฝํ”Œ๋กญ์—๋Š” ๋ฌด์กฐ๊ฑด ๋ณด์กด์šฉ ๊ณต๊ฐ„์„ ํ• ๋‹นํ•ด์•ผ ํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์ค‘ ๋น„ํŠธ ๋ณด์กด์šฉ ๊ณต๊ฐ„์˜ ์žฅ์ ์„ ์ถฉ๋ถ„ํžˆ ์‚ด๋ฆฌ์ง€ ๋ชปํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ๋ณด์กด์šฉ ๊ณต๊ฐ„์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•œ๋‹ค: (1) ๋ณด์กด์šฉ ๊ณต๊ฐ„ ํ• ๋‹น ๊ณผ์ •์—์„œ ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„๋ฅผ ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ๋Š” ์กฐ๊ฑด์„ ์ œ์‹œํ•˜๊ณ , (2) ํ•ด๋‹น ์กฐ๊ฑด์„ ์ด์šฉํ•ด ๋ฉ€ํ‹ฐํ”Œ๋ ‰์„œ ํ”ผ๋“œ๋ฐฑ ๋ฃจํ”„๊ฐ€ ์žˆ๋Š” ํ”Œ๋ฆฝํ”Œ๋กญ์ด ๋งŽ์ด ์กด์žฌํ•˜๋Š” ํšŒ๋กœ์—์„œ ๋ณด์กด์šฉ ๊ณต๊ฐ„์„ ์ตœ์†Œํ™”ํ•œ๋‹ค; (3) ์ถ”๊ฐ€๋กœ, ํ”Œ๋ฆฝํ”Œ๋กญ์— ์ด๋ฏธ ํ• ๋‹น๋œ ๋ณด์กด์šฉ ๊ณต๊ฐ„ ์ค‘ ์ผ๋ถ€๋ฅผ ์ œ๊ฑฐํ•  ์ˆ˜ ์žˆ๋Š” ์กฐ๊ฑด์„ ์ฐพ๊ณ , ์ด๋ฅผ ์ด์šฉํ•ด ๋ณด์กด์šฉ ๊ณต๊ฐ„์„ ๋” ๊ฐ์†Œ์‹œํ‚จ๋‹ค. ๋ฒค์น˜๋งˆํฌ ํšŒ๋กœ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋ก ์ด ๊ธฐ์กด์˜ ๋ณด์กด์šฉ ๊ณต๊ฐ„ ํ• ๋‹น ๋ฐฉ๋ฒ•๋ก ๋ณด๋‹ค ๋” ์ ์€ ๋ณด์กด์šฉ ๊ณต๊ฐ„์„ ํ• ๋‹นํ•˜๋ฉฐ, ๋”ฐ๋ผ์„œ ์นฉ์˜ ๋ฉด์  ๋ฐ ์ „๋ ฅ ์†Œ๋ชจ๋ฅผ ๊ฐ์†Œ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค.Low power operation of a chip is an important issue, and its importance is increasing as the process technology advances. This dissertation addresses the methodology of operating at low power for each of the SRAM and logic constituting the chip. Firstly, we propose a methodology to infer the minimum operating voltage at which SRAM failure does not occur in all SRAM blocks in the chip operating on near threshold voltage (NTV) regime through the measurement of a monitoring circuit. Operating the chip on NTV regime is one of the most effective ways to increase energy efficiency, but in case of SRAM, it is difficult to lower the operating voltage because of SRAM failure. However, since the process variation on each chip is different, the minimum operating voltage is also different for each chip. If it is possible to infer the minimum operating voltage of SRAM blocks of each chip through monitoring, energy efficiency can be increased by applying different voltage. In this dissertation, we propose a new methodology of resolving this problem. Specifically, (1) we propose to infer minimum operation voltage of SRAM in design infra development phase, and assign the voltage using measurement of SRAM monitor in silicon production phase; (2) we define a SRAM monitor and features to be monitored that can monitor process variation on SRAM blocks including SRAM bitcell and peripheral circuits; (3) we propose a new methodology of inferring minimum operating voltage of SRAM blocks in a chip that does not cause read, write, and access failures under a target confidence level. Through experiments with benchmark circuits, it is confirmed that applying different voltage to SRAM blocks in each chip that inferred by our proposed methodology can save overall power consumption of SRAM bitcell array compared to applying same voltage to SRAM blocks in all chips, while meeting the same yield target. Secondly, we propose a methodology to resolve the problem of the conventional retention storage allocation methods and thereby further reduce leakage power consumption of power gated circuit. Conventional retention storage allocation methods have problem of not fully utilizing the advantage of multi-bit retention storage because of the unavoidable allocation of retention storage on flip-flops with mux-feedback loop. In this dissertation, we propose a new methodology of breaking the bottleneck of minimizing the state retention storage. Specifically, (1) we find a condition that mux-feedback loop can be disregarded during the retention storage allocation; (2) utilizing the condition, we minimize the retention storage of circuits that contain many flip-flops with mux-feedback loop; (3) we find a condition to remove some of the retention storage already allocated to each of flip-flops and propose to further reduce the retention storage. Through experiments with benchmark circuits, it is confirmed that our proposed methodology allocates less retention storage compared to the state-of-the-art methods, occupying less cell area and consuming less power.1 Introduction 1 1.1 Low Voltage SRAM Monitoring Methodology 1 1.2 Retention Storage Allocation on Power Gated Circuit 5 1.3 Contributions of this Dissertation 8 2 SRAM On-Chip Monitoring Methodology for High Yield and Energy Efficient Memory Operation at Near Threshold Voltage 13 2.1 SRAM Failures 13 2.1.1 Read Failure 13 2.1.2 Write Failure 15 2.1.3 Access Failure 16 2.1.4 Hold Failure 16 2.2 SRAM On-chip Monitoring Methodology: Bitcell Variation 18 2.2.1 Overall Flow 18 2.2.2 SRAM Monitor and Monitoring Target 18 2.2.3 Vfail to Vddmin Inference 22 2.3 SRAM On-chip Monitoring Methodology: Peripheral Circuit IR Drop and Variation 29 2.3.1 Consideration of IR Drop 29 2.3.2 Consideration of Peripheral Circuit Variation 30 2.3.3 Vddmin Prediction including Access Failure Prohibition 33 2.4 Experimental Results 41 2.4.1 Vddmin Considering Read and Write Failures 42 2.4.2 Vddmin Considering Read/Write and Access Failures 45 2.4.3 Observation for Practical Use 45 3 Allocation of Always-On State Retention Storage for Power Gated Circuits - Steady State Driven Approach 49 3.1 Motivations and Analysis 49 3.1.1 Impact of Self-loop on Power Gating 49 3.1.2 Circuit Behavior Before Sleeping 52 3.1.3 Wakeup Latency vs. Retention Storage 54 3.2 Steady State Driven Retention Storage Allocation 56 3.2.1 Extracting Steady State Self-loop FFs 57 3.2.2 Allocating State Retention Storage 59 3.2.3 Designing and Optimizing Steady State Monitoring Logic 59 3.2.4 Analysis of the Impact of Steady State Monitoring Time on the Standby Power 63 3.3 Retention Storage Refinement Utilizing Steadiness 65 3.3.1 Extracting Flip-flops for Retention Storage Refinement 66 3.3.2 Designing State Monitoring Logic and Control Signals 68 3.4 Experimental Results 73 3.4.1 Comparison of State Retention Storage 75 3.4.2 Comparison of Power Consumption 79 3.4.3 Impact on Circuit Performance 82 3.4.4 Support for Immediate Power Gating 83 4 Conclusions 89 4.1 Chapter 2 89 4.2 Chapter 3 90๋ฐ•

    Efficient cache architectures for reliable hybrid voltage operation using EDC codes

    Get PDF
    Semiconductor technology evolution enables the design of sensor-based battery-powered ultra-low-cost chips (e.g., below 1 p) required for new market segments such as body, urban life and environment monitoring. Caches have been shown to be the highest energy and area consumer in those chips. This paper proposes a novel, hybrid-operation (high Vcc, ultra-low Vcc), single-Vcc domain cache architecture based on replacing energy-hungry bitcells (e.g., 10T) by more energy-efficient and smaller cells (e.g., 8T) enhanced with Error Detection and Correction (EDC) features for high reliability and performance predictability. Our architecture is proven to largely outperform existing solutions in terms of energy and area.Postprint (authorโ€™s final draft

    Circuits and Systems Advances in Near Threshold Computing

    Get PDF
    Modern society is witnessing a sea change in ubiquitous computing, in which people have embraced computing systems as an indispensable part of day-to-day existence. Computation, storage, and communication abilities of smartphones, for example, have undergone monumental changes over the past decade. However, global emphasis on creating and sustaining green environments is leading to a rapid and ongoing proliferation of edge computing systems and applications. As a broad spectrum of healthcare, home, and transport applications shift to the edge of the network, near-threshold computing (NTC) is emerging as one of the promising low-power computing platforms. An NTC device sets its supply voltage close to its threshold voltage, dramatically reducing the energy consumption. Despite showing substantial promise in terms of energy efficiency, NTC is yet to see widescale commercial adoption. This is because circuits and systems operating with NTC suffer from several problems, including increased sensitivity to process variation, reliability problems, performance degradation, and security vulnerabilities, to name a few. To realize its potential, we need designs, techniques, and solutions to overcome these challenges associated with NTC circuits and systems. The readers of this book will be able to familiarize themselves with recent advances in electronics systems, focusing on near-threshold computing

    Energy-Aware Data Movement In Non-Volatile Memory Hierarchies

    Get PDF
    While technology scaling enables increased density for memory cells, the intrinsic high leakage power of conventional CMOS technology and the demand for reduced energy consumption inspires the use of emerging technology alternatives such as eDRAM and Non-Volatile Memory (NVM) including STT-MRAM, PCM, and RRAM. The utilization of emerging technology in Last Level Cache (LLC) designs which occupies a signifcant fraction of total die area in Chip Multi Processors (CMPs) introduces new dimensions of vulnerability, energy consumption, and performance delivery. To be specific, a part of this research focuses on eDRAM Bit Upset Vulnerability Factor (BUVF) to assess vulnerable portion of the eDRAM refresh cycle where the critical charge varies depending on the write voltage, storage and bit-line capacitance. This dissertation broaden the study on vulnerability assessment of LLC through investigating the impact of Process Variations (PV) on narrow resistive sensing margins in high-density NVM arrays, including on-chip cache and primary memory. Large-latency and power-hungry Sense Amplifers (SAs) have been adapted to combat PV in the past. Herein, a novel approach is proposed to leverage the PV in NVM arrays using Self-Organized Sub-bank (SOS) design. SOS engages the preferred SA alternative based on the intrinsic as-built behavior of the resistive sensing timing margin to reduce the latency and power consumption while maintaining acceptable access time. On the other hand, this dissertation investigates a novel technique to prioritize the service to 1) Extensive Read Reused Accessed blocks of the LLC that are silently dropped from higher levels of cache, and 2) the portion of the working set that may exhibit distant re-reference interval in L2. In particular, we develop a lightweight Multi-level Access History Profiler to effciently identify ERRA blocks through aggregating the LLC block addresses tagged with identical Most Signifcant Bits into a single entry. Experimental results indicate that the proposed technique can reduce the L2 read miss ratio by 51.7% on average across PARSEC and SPEC2006 workloads. In addition, this dissertation will broaden and apply advancements in theories of subspace recovery to pioneer computationally-aware in-situ operand reconstruction via the novel Logic In Interconnect (LI2) scheme. LI2 will be developed, validated, and re?ned both theoretically and experimentally to realize a radically different approach to post-Moore\u27s Law computing by leveraging low-rank matrices features offering data reconstruction instead of fetching data from main memory to reduce energy/latency cost per data movement. We propose LI2 enhancement to attain high performance delivery in the post-Moore\u27s Law era through equipping the contemporary micro-architecture design with a customized memory controller which orchestrates the memory request for fetching low-rank matrices to customized Fine Grain Reconfigurable Accelerator (FGRA) for reconstruction while the other memory requests are serviced as before. The goal of LI2 is to conquer the high latency/energy required to traverse main memory arrays in the case of LLC miss, by using in-situ construction of the requested data dealing with low-rank matrices. Thus, LI2 exchanges a high volume of data transfers with a novel lightweight reconstruction method under specific conditions using a cross-layer hardware/algorithm approach

    Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks

    Full text link
    Edge computing is a promising solution for handling high-dimensional, multispectral analog data from sensors and IoT devices for applications such as autonomous drones. However, edge devices' limited storage and computing resources make it challenging to perform complex predictive modeling at the edge. Compute-in-memory (CiM) has emerged as a principal paradigm to minimize energy for deep learning-based inference at the edge. Nevertheless, integrating storage and processing complicates memory cells and/or memory peripherals, essentially trading off area efficiency for energy efficiency. This paper proposes a novel solution to improve area efficiency in deep learning inference tasks. The proposed method employs two key strategies. Firstly, a Frequency domain learning approach uses binarized Walsh-Hadamard Transforms, reducing the necessary parameters for DNN (by 87% in MobileNetV2) and enabling compute-in-SRAM, which better utilizes parallelism during inference. Secondly, a memory-immersed collaborative digitization method is described among CiM arrays to reduce the area overheads of conventional ADCs. This facilitates more CiM arrays in limited footprint designs, leading to better parallelism and reduced external memory accesses. Different networking configurations are explored, where Flash, SA, and their hybrid digitization steps can be implemented using the memory-immersed scheme. The results are demonstrated using a 65 nm CMOS test chip, exhibiting significant area and energy savings compared to a 40 nm-node 5-bit SAR ADC and 5-bit Flash ADC. By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.Comment: arXiv admin note: text overlap with arXiv:2307.03863, arXiv:2309.0177

    Energy Efficient Network-on-Chip Architectures for Many-Core Near-Threshold Computing System

    Get PDF
    Near threshold computing has unraveled a promising design space for energy efficient computing. However, it is still plagued by sub-optimal system performance. Application characteristics and hardware non-idealities of conventional architectures (those optimized for nominal voltage) prevent us from fully leveraging the potential of NTC systems. Increasing the computational core count still forms the bedrock of a multitude of contemporary works that address the problem of performance degradation in NTC systems. However, these works do not categorically address the shortcomings of the conventional on-chip interconnect fabric in a many core environment. In this work, we quantitatively demonstrate the performance bottleneck created by a conventional NTC architecture in many-core NTC systems. To reclaim the performance lost due to a sub-optimal NoC in many-core NTC systems, we propose BoostNoCโ€”a power efficient, multi-layered network-on-chip architecture. BoostNoC improves the system performance by nearly 2ร— over a conventional NTC system, while largely sustaining its energy benefits. Further, capitalizing on the application characteristics, we propose two BoostNoC derivative designs: (i) PG BoostNoC; and (ii) Drowsy BoostNoC; to improve the energy efficiency by 1.4ร— and 1.37ร—, respectively over conventional NTC system

    Low power digital baseband core for wireless Micro-Neural-Interface using CMOS sub/near-threshold circuit

    Get PDF
    This thesis presents the work on designing and implementing a low power digital baseband core with custom-tailored protocol for wirelessly powered Micro-Neural-Interface (MNI) System-on-Chip (SoC) to be implanted within the skull to record cortical neural activities. The core, on the tag end of distributed sensors, is designed to control the operation of individual MNI and communicate and control MNI devices implanted across the brain using received downlink commands from external base station and store/dump targeted neural data uplink in an energy efficient manner. The application specific protocol defines three modes (Time Stamp Mode, Streaming Mode and Snippet Mode) to extract neural signals with on-chip signal conditioning and discrimination. In Time Stamp Mode, Streaming Mode and Snippet Mode, the core executes basic on-chip spike discrimination and compression, real-time monitoring and segment capturing of neural signals so single spike timing as well as inter-spike timing can be retrieved with high temporal and spatial resolution. To implement the core control logic using sub/near-threshold logic, a novel digital design methodology is proposed which considers INWE (Inverse-Narrow-Width-Effect), RSCE (Reverse-Short-Channel-Effect) and variation comprehensively to size the transistor width and length accordingly to achieve close-to-optimum digital circuits. Ultra-low-power cell library containing 67 cells including physical cells and decoupling capacitor cells using the optimum fingers is designed, laid-out, characterized, and abstracted. A robust on-chip sense-amp-less SRAM memory (8X32 size) for storing neural data is implemented using 8T topology and LVT fingers. The design is validated with silicon tapeout and measurement shows the digital baseband core works at 400mV and 1.28 MHz system clock with an average power consumption of 2.2 ฮผW, resulting in highest reported communication power efficiency of 290Kbps/ฮผW to date
    • โ€ฆ
    corecore