200 research outputs found
High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory
In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design
A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems
In this paper, we present a novel cache design based on Multi-Level Cell
Spin-Transfer Torque RAM (MLC STTRAM) that can dynamically adapt the set
capacity and associativity to use efficiently the full potential of MLC STTRAM.
We exploit the asymmetric nature of the MLC storage scheme to build cache lines
featuring heterogeneous performances, that is, half of the cache lines are
read-friendly, while the other is write-friendly. Furthermore, we propose to
opportunistically deactivate ways in underutilized sets to convert MLC to
Single-Level Cell (SLC) mode, which features overall better performance and
lifetime. Our ultimate goal is to build a cache architecture that combines the
capacity advantages of MLC and performance/energy advantages of SLC. Our
experiments show an improvement of 43% in total numbers of conflict misses, 27%
in memory access latency, 12% in system performance, and 26% in LLC access
energy, with a slight degradation in cache lifetime (about 7%) compared to an
SLC cache
DeepNVM++: Cross-Layer Modeling and Optimization Framework of Non-Volatile Memories for Deep Learning
Non-volatile memory (NVM) technologies such as spin-transfer torque magnetic
random access memory (STT-MRAM) and spin-orbit torque magnetic random access
memory (SOT-MRAM) have significant advantages compared to conventional SRAM due
to their non-volatility, higher cell density, and scalability features. While
previous work has investigated several architectural implications of NVM for
generic applications, in this work we present DeepNVM++, a framework to
characterize, model, and analyze NVM-based caches in GPU architectures for deep
learning (DL) applications by combining technology-specific circuit-level
models and the actual memory behavior of various DL workloads. We present both
iso-capacity and iso-area performance and energy analysis for systems whose
last-level caches rely on conventional SRAM and emerging STT-MRAM and SOT-MRAM
technologies. In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to
3.8x and 4.7x energy-delay product (EDP) reduction and 2.4x and 2.8x area
reduction compared to conventional SRAM, respectively. Under iso-area
assumptions, STT-MRAM and SOT-MRAM provide up to 2x and 2.3x EDP reduction and
accommodate 2.3x and 3.3x cache capacity when compared to SRAM, respectively.
We also perform a scalability analysis and show that STT-MRAM and SOT-MRAM
achieve orders of magnitude EDP reduction when compared to SRAM for large cache
capacities. Our comprehensive cross-layer framework is demonstrated on
STT-/SOT-MRAM technologies and can be used for the characterization, modeling,
and analysis of any NVM technology for last-level caches in GPUs for DL
applications.Comment: 12 pages, 10 figure
Reliable and Energy Efficient MLC STT-RAM Buffer for CNN Accelerators
We propose a lightweight scheme where the formation of a data block is changed in such a way that it can tolerate soft errors significantly better than the baseline. The key insight behind our work is that CNN weights are normalized between -1 and 1 after each convolutional layer, and this leaves one bit unused in half-precision floating-point representation. By taking advantage of the unused bit, we create a backup for the most significant bit to protect it against the soft errors. Also, considering the fact that in MLC STT-RAMs the cost of memory operations (read and write), and reliability of a cell are content-dependent (some patterns take larger current and longer time, while they are more susceptible to soft error), we rearrange the data block to minimize the number of costly bit patterns. Combining these two techniques provides the same level of accuracy compared to an error-free baseline while improving the read and write energy by 9% and 6%, respectively
Exploring Spin-transfer-torque devices and memristors for logic and memory applications
As scaling CMOS devices is approaching its physical limits, researchers have begun exploring newer devices and architectures to replace CMOS.
Due to their non-volatility and high density, Spin Transfer Torque (STT) devices are among the most prominent candidates for logic and memory applications. In this research, we first considered a new logic style called All Spin Logic (ASL). Despite its advantages, ASL consumes a large amount of static power; thus, several optimizations can be performed to address this issue. We developed a systematic methodology to perform the optimizations to ensure stable operation of ASL.
Second, we investigated reliable design of STT-MRAM bit-cells and addressed the conflicting read and write requirements, which results in overdesign of the bit-cells. Further, a Device/Circuit/Architecture co-design framework was developed to optimize the STT-MRAM devices by exploring the design space through jointly considering yield enhancement techniques at different levels of abstraction.
Recent advancements in the development of memristive devices have opened new opportunities for hardware implementation of non-Boolean computing. To this end, the suitability of memristive devices for swarm intelligence algorithms has enabled researchers to solve a maze in hardware. In this research, we utilized swarm intelligence of memristive networks to perform image edge detection. First, we proposed a hardware-friendly algorithm for image edge detection based on ant colony. Next, we designed the image edge detection algorithm using memristive networks
- …