2,679 research outputs found
Computing in Memory with Spin-Transfer Torque Magnetic RAM
In-memory computing is a promising approach to addressing the
processor-memory data transfer bottleneck in computing systems. We propose
Spin-Transfer Torque Compute-in-Memory (STT-CiM), a design for in-memory
computing with Spin-Transfer Torque Magnetic RAM (STT-MRAM). The unique
properties of spintronic memory allow multiple wordlines within an array to be
simultaneously enabled, opening up the possibility of directly sensing
functions of the values stored in multiple rows using a single access. We
propose modifications to STT-MRAM peripheral circuits that leverage this
principle to perform logic, arithmetic, and complex vector operations. We
address the challenge of reliable in-memory computing under process variations
by extending ECC schemes to detect and correct errors that occur during CiM
operations. We also address the question of how STT-CiM should be integrated
within a general-purpose computing system. To this end, we propose
architectural enhancements to processor instruction sets and on-chip buses that
enable STT-CiM to be utilized as a scratchpad memory. Finally, we present data
mapping techniques to increase the effectiveness of STT-CiM. We evaluate
STT-CiM using a device-to-architecture modeling framework, and integrate
cycle-accurate models of STT-CiM with a commercial processor and on-chip bus
(Nios II and Avalon from Intel). Our system-level evaluation shows that STT-CiM
provides system-level performance improvements of 3.93x on average (upto
10.4x), and concurrently reduces memory system energy by 3.83x on average (upto
12.4x)
Write error rate of spin-transfer-torque random access memory including micromagnetic effects using rare event enhancement
Spin-transfer-torque random access memory (STT-RAM) is a promising candidate
for the next-generation of random-access-memory due to improved scalability,
read-write speeds and endurance. However, the write pulse duration must be long
enough to ensure a low write error rate (WER), the probability that a bit will
remain unswitched after the write pulse is turned off, in the presence of
stochastic thermal effects. WERs on the scale of 10 or lower are
desired. Within a macrospin approximation, WERs can be calculated analytically
using the Fokker-Planck method to this point and beyond. However, dynamic
micromagnetic effects within the bit can affect and lead to faster switching.
Such micromagnetic effects can be addressed via numerical solution of the
stochastic Landau-Lifshitz-Gilbert-Slonczewski (LLGS) equation. However,
determining WERs approaching 10 would require well over 10 such
independent simulations, which is infeasible. In this work, we explore
calculation of WER using "rare event enhancement" (REE), an approach that has
been used for Monte Carlo simulation of other systems where rare events
nevertheless remain important. Using a prototype REE approach tailored to the
STT-RAM switching physics, we demonstrate reliable calculation of a WER to
10 with sets of only approximately 10 ongoing stochastic LLGS
simulations, and the apparent ability to go further.Comment: 7 pages, 5 figure
Numerical Fokker-Planck study of stochastic write error slope in spin torque switching
This paper analyzes write errors in spin torque switching due to thermal
fluctuations in a system with Perpendicular Magnetic Anisotropy (PMA). Prior
analytical and numerical methods are summarized, a physics based Fokker-Planck
Equation (FPE) chosen for its computational efficiency and broad applicability
to all switching regimes. The relation between write error slope and material
parameters is discussed in detail to enable better device engineering and
optimization. Finally a 2D FPE tool is demonstrated that extends the
applicability of FPE to write error in non PMA systems with built-in asymmetry.Comment: 7 pages, 8 figure
Circuit Theory for SPICE of Spintronic Integrated Circuits
We present a theoretical and a numerical formalism for analysis and design of
spintronic integrated circuits (SPINICs). The formalism encompasses a
generalized circuit theory for spintronic integrated circuits based on
nanomagnetic dynamics and spin transport. We propose an extension to the
Modified Nodal Analysis technique for the analysis of spin circuits based on
the recently developed spin conduction matrices. We demonstrate the
applicability of the framework using an example spin logic circuit described
using spin Netlists.Comment: 14 pages, 11 figures; added fig. 2; added citations; modified title
to emphasize SPICE; Results unchange
Nanomagnetic Logic and Magnetization Switching Dynamics in Spin Torque Majority Gates
Spin torque majority gates are modeled and several regimes of magnetization
switching (some leading to failure) are discovered. The switching speed and
noise margins are determined for STMGs and an adder based on it. With switching
time of 3ns at current of 80uA, the adder computational throughput is
comparable to that of a CMOS adder.Comment: 4 pages, 14 figures, IEEE International Magnetic Conference Technical
Digest, BT-08 (2012
Encoding Neural and Synaptic Functionalities in Electron Spin: A Pathway to Efficient Neuromorphic Computing
Present day computers expend orders of magnitude more computational resources
to perform various cognitive and perception related tasks that humans routinely
perform everyday. This has recently resulted in a seismic shift in the field of
computation where research efforts are being directed to develop a
neurocomputer that attempts to mimic the human brain by nanoelectronic
components and thereby harness its efficiency in recognition problems. Bridging
the gap between neuroscience and nanoelectronics, this paper attempts to
provide a review of the recent developments in the field of spintronic device
based neuromorphic computing. Description of various spin-transfer torque
mechanisms that can be potentially utilized for realizing device structures
mimicking neural and synaptic functionalities is provided. A cross-layer
perspective extending from the device to the circuit and system level is
presented to envision the design of an All-Spin neuromorphic processor enabled
with on-chip learning functionalities. Device-circuit-algorithm co-simulation
framework calibrated to experimental results suggest that such All-Spin
neuromorphic systems can potentially achieve almost two orders of magnitude
energy improvement in comparison to state-of-the-art CMOS implementations.Comment: The paper will appear in a future issue of Applied Physics Review
From materials to systems: a multiscale analysis of nanomagnetic switching
With the increasing demand for low-power electronics, nanomagnetic devices
have emerged as strong potential candidates to complement present day
transistor technology. A variety of novel switching effects such as spin torque
and giant spin Hall offer scalable ways to manipulate nano-sized magnets.
However, the low intrinsic energy cost of switching spins is often compromised
by the energy consumed in the overhead circuitry in creating the necessary
switching fields. Scaling brings in added concerns such as the ability to
distinguish states (readability) and to write information without spontaneous
backflips (reliability). A viable device must ultimately navigate a complex
multi-dimensional material and design space defined by volume, energy budget,
speed and a target read-write-retention error. In this paper, we review the
major challenges facing nanomagnetic devices and present a multi-scale
computational framework to explore possible innovations at different levels
(material, device, or circuit), along with a holistic understanding of their
overall energy-delay-reliability tradeoff.Comment: Submitted to Journal of Computational Electronics Special Issue
"Computational Electronics of Emerging Memory Elements
Multi-bit MRAM storage cells utilizing serially connected perpendicular magnetic tunnel junctions
Serial connection of multiple memory cells using perpendicular magnetic
tunnel junctions (pMTJ) is proposed as a way to increase magnetic random access
memory (MRAM) storage density. Multi-bit storage element is designed using
pMTJs fabricated on a single wafer stack, with a serial connections realized
using top-to-bottom vias. Tunneling magnetoreistance effect above 130%, current
induced magnetization switching in zero external magnetic field and stability
diagram analysis of single, two-bit and three-bit cells are presented together
with thermal stability. The proposed design is easy to manufacture and can lead
to increase capacity of future MRAM devices
Spin torque building blocks
The discovery of the spin torque effect has made magnetic nanodevices
realistic candidates for active elements of memory devices and applications.
Magnetoresistive effects allow the read-out of increasingly small magnetic
bits, and the spin torque provides an efficient tool to manipulate - precisely,
rapidly and at low energy cost - the magnetic state, which is in turn the
central information medium of spintronic devices. By keeping the same magnetic
stack, but by tuning a device's shape and bias conditions, the spin torque can
be engineered to build a variety of advanced magnetic nanodevices. Here we show
that by assembling these nanodevices as building blocks with different
functionalities, novel types of computing architectures can be envisisaged. We
focus in particular on recent concepts such as magnonics and spintronic neural
networks
Energy-Efficient Runtime Adaptable L1 STT-RAM Cache Design
Much research has shown that applications have variable runtime cache
requirements. In the context of the increasingly popular Spin-Transfer Torque
RAM (STT-RAM) cache, the retention time, which defines how long the cache can
retain a cache block in the absence of power, is one of the most important
cache requirements that may vary for different applications. In this paper, we
propose a Logically Adaptable Retention Time STT-RAM (LARS) cache that allows
the retention time to be dynamically adapted to applications' runtime
requirements. LARS cache comprises of multiple STT-RAM units with different
retention times, with only one unit being used at a given time. LARS
dynamically determines which STT-RAM unit to use during runtime, based on
executing applications' needs. As an integral part of LARS, we also explore
different algorithms to dynamically determine the best retention time based on
different cache design tradeoffs. Our experiments show that by adapting the
retention time to different applications' requirements, LARS cache can reduce
the average cache energy by 25.31%, compared to prior work, with minimal
overheads
- …