5 research outputs found

    Efficient Placement and Migration Policies for an STT-RAM based Hybrid L1 Cache for Intermittently Powered Systems

    Full text link
    The number of battery-powered devices is rapidly increasing due to the widespread use of IoT-enabled nodes in various fields. Energy harvesters, which help to power embedded devices, are a feasible alternative to replacing battery-powered devices. In a capacitor, the energy harvester stores enough energy to power up the embedded device and compute the task. This type of computation is referred to as intermittent computing. Energy harvesters are unable to supply continuous power to embedded devices. All registers and cache in conventional processors are volatile. We require a Non-Volatile Memory (NVM)-based Non-Volatile Processor (NVP) that can store registers and cache contents during a power failure. NVM-based caches reduce system performance and consume more energy than SRAM-based caches. This paper proposes Efficient Placement and Migration policies for hybrid cache architecture that uses SRAM and STT-RAM at the first level cache. The proposed architecture includes cache block placement and migration policies to reduce the number of writes to STT-RAM. During a power failure, the backup strategy identifies and migrates the critical blocks from SRAM to STT-RAM. When compared to the baseline architecture, the proposed architecture reduces STT-RAM writes from 63.35% to 35.93%, resulting in a 32.85% performance gain and a 23.42% reduction in energy consumption. Our backup strategy reduces backup time by 34.46% when compared to the baseline

    ARCHITECTING EMERGING MEMORY TECHNOLOGIES FOR ENERGY-EFFICIENT COMPUTING IN MODERN PROCESSORS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Technology Implications for Large Last-Level Caches

    Get PDF
    Large last-level cache (L3C) is efficient for bridging the performance and power gap between processor and memory. Several memory technologies, including SRAM, STT-RAM (MRAM), and embedded DRAM (eDRAM), have been used or considered as the technology to implement L3Cs. However, each of them has inherent weaknesses: SRAM is relatively low density and dissipates high leakage; STT-RAM has long write latency and requires high write energy; eDRAM requires refresh. As future processors are expected to have larger last-level caches, the objective of this dissertation is to study the tradeoffs associated with using each of these technologies to implement L3Cs. In order to make useful comparisons between L3Cs built with SRAM, STT-RAM, and eDRAM, we consider and implement several levels of details. First, to obtain unbiased cache performance and power properties (i.e., read/write access latency, read/write access energy, leakage power, refresh power, area), we prototype caches based on realistic memory and device models. Second, we present simplistic analytical models that enable us to quickly examine different memory technologies under various scenarios. Third, we review power-optimization techniques for each of the technologies, and propose using a low-cost dead-line prediction scheme for eDRAM-based L3Cs to eliminate unnecessary refreshes. Finally, the highlight of this dissertation is the comparison and analysis of low-leakage SRAM, low write-energy STT-RAM, and refresh-optimized eDRAM. We report system performance, last-level cache energy breakdown, and memory hierarchy energy breakdown, using an augmented full-system simulator with the execution of a range of workloads and input sets. From the insights gained through simulation results, STT-RAM has the highest potential to save energy in future L3C designs. For contemporary processors, SRAM-based L3C results in the fastest system performance, whereas eDRAM consumes the lowest energy

    Straintronics: A Leap towards Ultimate Energy Efficiency of Magnetic Memory and Logic

    Full text link
    After decades of exponential growth of the semiconductor industries, predicted by Moore’s Law, the complementary metal-oxide semiconductor (CMOS) circuits are approaching their end of the road, as the feature sizes reach sub-10nm regimes, leaving electrical engineers with a profusion of design challenges in terms of energy limitations and power density. The latter has left the road for alternative technologies wide open to help CMOS overcome the present challenges. Magnetic random access memories (MRAM) are one of the candidates to assist with aforesaid obstacles. Proposed in the early 90’s, MRAM has been under research and development for decades. The expedition for energy efficient MRAM is carried out by the fact that magnetic logic, potentially, has orders of magnitude lower switching energy compared to a charge-based CMOS logic since, in a nanomagnet, magnetic domains would self-align with each other. Regrettably, conventional methods for switching the state of the cell in an MRAM, field induced magnetization switching (FIMS) and spin transfer torque (STT), use electric current (flow of charges) to switch the state of the magnet, nullifying the energy advantage, stated above. In order to maximize the energy efficiency, the amount of charge required to switch the state of the MTJ should be minimized. To this end, straintronics, as an alternative energy efficient method to FIMS and STT to switch the state of a nanomagnet, is proposed recently. The method states that by combining piezoelectricity and inverse magnetostriction, the magnetization state of the device can flip, within few nano-seconds while reducing the switching energy by orders of magnitude compared to STT and FIMS. This research focuses on analysis, design, modeling, and applications of straintronics-based MTJ. The first goal is to perform an in-depth analysis on the static and dynamic behavior of the device. Next, we are aiming to increase the accuracy of the model by including the effect of temperature and thermal noise on the device’s behavior. The goal of performing such analysis is to create a comprehensive model of the device that predicts both static and dynamic responses of the magnetization to applied stress. The model will be used to interface the device with CMOS controllers and switches in large systems. Next, in an attempt to speed up the simulation of such devices in multi-megabyte memory systems, a liberal model has been developed by analytically approximating a solution to the magnetization dynamics, which should be numerically solved otherwise. The liberal model demonstrates more than two orders of magnitude speed improvement compared to the conventional numerical models. Highlighting the applications of the straintronics devices by combining such devices with peripheral CMOS circuitry is another goal of the research. Design of a proof-of-concept 2 kilo-bit nonvolatile straintronics-based memory was introduced in our recent work. To highlight the potential applications of the straintronics device, beyond data storage, the use of the principle in ultra-fast yet low power true random number generation and neuron/synapse design for artificial neural networks have been investigated. Lastly, in an attempt to investigate the practicality of the straintronics principle, the effect of process variations and interface imperfections on the switching behavior of the magnetization is investigated. The results reveal the destructive aftermath of fabrication imperfections on the switching pattern of the device, leaving careful pulse-shaping, alternative topologies, or combination with STT as the last resorts for successful strain-based magnetization switching.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137010/1/barangi_1.pd

    STT-RAM cache hierarchy with multiretention MTJ designs

    No full text
    10.1109/TVLSI.2013.2267754IEEE Transactions on Very Large Scale Integration (VLSI) Systems2261281-1293IEVS
    corecore