44 research outputs found
Spin-Transfer-Torque (STT) Devices for On-chip Memory and Their Applications to Low-standby Power Systems
With the scaling of CMOS technology, the proportion of the leakage power to total power consumption increases. Leakage may account for almost half of total power consumption in high performance processors. In order to reduce the leakage power, there is an increasing interest in using nonvolatile storage devices for memory applications. Among various promising nonvolatile memory elements, spin-transfer torque magnetic RAM (STT-MRAM) is identified as one of the most attractive alternatives to conventional SRAM. However, several design challenges of STT-MRAM such as shared read and write current paths, single-ended sensing, and high dynamic power are major challenges to be overcome to make it suitable for on-chip memories. To mitigate such problems, we propose a domain wall coupling based spin-transfer torque (DWCSTT) device for on-chip caches. Our proposed DWCSTT bit-cell decouples the read and the write current paths by the electrically-insulating magnetic coupling layer so that we can separately optimize read operation without having an impact on write-ability. In addition, the complementary polarizer structure in the read path of the DWCSTT device allows DWCSTT to enable self-referenced differential sensing. DWCSTT bit-cells improve the write power consumption due to the low electrical resistance of the write current path. Furthermore, we also present three different bit-cell level design techniques of Spin-Orbit Torque MRAM (SOT-MRAM) for alleviating some of the inefficiencies of conventional magnetic memories while maintaining the advantages of spin-orbit torque (SOT) based novel switching mechanism such as low write current requirement and decoupled read and write current path. Our proposed SOT-MRAM with supporting dual read/write ports (1R/1W) can address the issue of high-write latency of STT-MRAM by simultaneous 1R/1W accesses. Second, we propose a new type of SOT-MRAM which uses only one access transistor along with a Schottky diode in order to mitigate the area-overhead caused by two access transistors in conventional SOT-MRAM. Finally, a new design technique of SOT-MRAM is presented to improve the integration density by utilizing a shared bit-line structure
Exploring Spin-transfer-torque devices and memristors for logic and memory applications
As scaling CMOS devices is approaching its physical limits, researchers have begun exploring newer devices and architectures to replace CMOS.
Due to their non-volatility and high density, Spin Transfer Torque (STT) devices are among the most prominent candidates for logic and memory applications. In this research, we first considered a new logic style called All Spin Logic (ASL). Despite its advantages, ASL consumes a large amount of static power; thus, several optimizations can be performed to address this issue. We developed a systematic methodology to perform the optimizations to ensure stable operation of ASL.
Second, we investigated reliable design of STT-MRAM bit-cells and addressed the conflicting read and write requirements, which results in overdesign of the bit-cells. Further, a Device/Circuit/Architecture co-design framework was developed to optimize the STT-MRAM devices by exploring the design space through jointly considering yield enhancement techniques at different levels of abstraction.
Recent advancements in the development of memristive devices have opened new opportunities for hardware implementation of non-Boolean computing. To this end, the suitability of memristive devices for swarm intelligence algorithms has enabled researchers to solve a maze in hardware. In this research, we utilized swarm intelligence of memristive networks to perform image edge detection. First, we proposed a hardware-friendly algorithm for image edge detection based on ant colony. Next, we designed the image edge detection algorithm using memristive networks
DeepNVM++: Cross-Layer Modeling and Optimization Framework of Non-Volatile Memories for Deep Learning
Non-volatile memory (NVM) technologies such as spin-transfer torque magnetic
random access memory (STT-MRAM) and spin-orbit torque magnetic random access
memory (SOT-MRAM) have significant advantages compared to conventional SRAM due
to their non-volatility, higher cell density, and scalability features. While
previous work has investigated several architectural implications of NVM for
generic applications, in this work we present DeepNVM++, a framework to
characterize, model, and analyze NVM-based caches in GPU architectures for deep
learning (DL) applications by combining technology-specific circuit-level
models and the actual memory behavior of various DL workloads. We present both
iso-capacity and iso-area performance and energy analysis for systems whose
last-level caches rely on conventional SRAM and emerging STT-MRAM and SOT-MRAM
technologies. In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to
3.8x and 4.7x energy-delay product (EDP) reduction and 2.4x and 2.8x area
reduction compared to conventional SRAM, respectively. Under iso-area
assumptions, STT-MRAM and SOT-MRAM provide up to 2x and 2.3x EDP reduction and
accommodate 2.3x and 3.3x cache capacity when compared to SRAM, respectively.
We also perform a scalability analysis and show that STT-MRAM and SOT-MRAM
achieve orders of magnitude EDP reduction when compared to SRAM for large cache
capacities. Our comprehensive cross-layer framework is demonstrated on
STT-/SOT-MRAM technologies and can be used for the characterization, modeling,
and analysis of any NVM technology for last-level caches in GPUs for DL
applications.Comment: 12 pages, 10 figure
XNOR-VSH: A Valley-Spin Hall Effect-based Compact and Energy-Efficient Synaptic Crossbar Array for Binary Neural Networks
Binary neural networks (BNNs) have shown an immense promise for
resource-constrained edge artificial intelligence (AI) platforms as their
binarized weights and inputs can significantly reduce the compute, storage and
communication costs. Several works have explored XNOR-based BNNs using SRAMs
and nonvolatile memories (NVMs). However, these designs typically need two
bit-cells to encode signed weights leading to an area overhead. In this paper,
we address this issue by proposing a compact and low power in-memory computing
(IMC) of XNOR-based dot products featuring signed weight encoding in a single
bit-cell. Our approach utilizes valley-spin Hall (VSH) effect in monolayer
tungsten di-selenide to design an XNOR bit-cell (named 'XNOR-VSH') with
differential storage and access-transistor-less topology. We co-optimize the
proposed VSH device and a memory array to enable robust in-memory dot product
computations between signed binary inputs and signed binary weights with sense
margin (SM) > 1 micro-amps. Our results show that the proposed XNOR-VSH array
achieves 4.8% ~ 9.0% and 37% ~ 63% lower IMC latency and energy, respectively,
with 4 % ~ 64% smaller area compared to spin-transfer-torque (STT)-MRAM and
spin-orbit-torque (SOT)-MRAM based XNOR-arrays
HALLS: An Energy-Efficient Highly Adaptable Last Level STT-RAM Cache for Multicore Systems
Spin-Transfer Torque RAM (STT-RAM) is widely considered a promising
alternative to SRAM in the memory hierarchy due to STT-RAM's non-volatility,
low leakage power, high density, and fast read speed. The STT-RAM's small
feature size is particularly desirable for the last-level cache (LLC), which
typically consumes a large area of silicon die. However, long write latency and
high write energy still remain challenges of implementing STT-RAMs in the CPU
cache. An increasingly popular method for addressing this challenge involves
trading off the non-volatility for reduced write speed and write energy by
relaxing the STT-RAM's data retention time. However, in order to maximize
energy saving potential, the cache configurations, including STT-RAM's
retention time, must be dynamically adapted to executing applications' variable
memory needs. In this paper, we propose a highly adaptable last level STT-RAM
cache (HALLS) that allows the LLC configurations and retention time to be
adapted to applications' runtime execution requirements. We also propose
low-overhead runtime tuning algorithms to dynamically determine the best
(lowest energy) cache configurations and retention times for executing
applications. Compared to prior work, HALLS reduced the average energy
consumption by 60.57% in a quad-core system, while introducing marginal latency
overhead.Comment: To Appear on IEEE Transactions on Computers (TC
Area-Efficient Spin-Orbit Torque Magnetic Random-Access Memory
Spin-orbit torque magnetic random-access memory (SOT-MRAM) has shown promising potential to realize reliable, high-speed and energy-efficient on-chip memory. However, conventional SOT-MRAM requires two access transistors per cell. This limits the use of conventional SOT-MRAM in high-density memories. Thus, various architectures in the literature have been proposed to improve the area efficiency of the SOT-MRAM. In this chapter, these proposals are divided into two categories: non-diode-based SOT-MRAM and diode-based SOT-MRAM cells. The non-diode-based proposals may result in a 1-bit effective area saving up to 50% compared to the conventional SOT-MRAM, whereas the diode-based designs may result in 1-bit effective area-saving of up to 75%. However, the area saving may be accompanied by higher energy and reliability issue penalties. Therefore, here, the various proposals in the literature are presented, highlighting the pros and cons of each design. Moreover, the technology requirements to realize these proposals are discussed. Finally, the various designs are evaluated from both cell and system level perspectives
Design Space Exploration and Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays: Device-Circuit Non-Idealities and System Accuracy
In-memory computing (IMC) utilizing synaptic crossbar arrays is promising for
deep neural networks to attain high energy efficiency and integration density.
Towards that end, various CMOS and post-CMOS technologies have been explored as
promising synaptic device candidates which include SRAM, ReRAM, FeFET,
SOT-MRAM, etc. However, each of these technologies has its own pros and cons,
which need to be comparatively evaluated in the context of synaptic array
designs. For a fair comparison, such an analysis must carefully optimize each
technology, specifically for synaptic crossbar design accounting for device and
circuit non-idealities in crossbar arrays such as variations, wire resistance,
driver/sink resistance, etc. In this work, we perform a comprehensive design
space exploration and comparative evaluation of different technologies at 7nm
technology node for synaptic crossbar arrays, in the context of IMC robustness
and system accuracy. Firstly, we integrate different technologies into a
cross-layer simulation flow based on physics-based models of synaptic devices
and interconnects. Secondly, we optimize both technology-agnostic design knobs
such as input encoding and ON-resistance as well as technology-specific design
parameters including ferroelectric thickness in FeFET and MgO thickness in
SOT-MRAM. Our optimization methodology accounts for the implications of device-
and circuit-level non-idealities on the system-level accuracy for each
technology. Finally, based on the optimized designs, we obtain inference
results for ResNet-20 on CIFAR-10 dataset and show that FeFET-based crossbar
arrays achieve the highest accuracy due to their compactness, low leakage and
high ON/OFF current ratio