1,004 research outputs found
Exploring Spin-transfer-torque devices and memristors for logic and memory applications
As scaling CMOS devices is approaching its physical limits, researchers have begun exploring newer devices and architectures to replace CMOS.
Due to their non-volatility and high density, Spin Transfer Torque (STT) devices are among the most prominent candidates for logic and memory applications. In this research, we first considered a new logic style called All Spin Logic (ASL). Despite its advantages, ASL consumes a large amount of static power; thus, several optimizations can be performed to address this issue. We developed a systematic methodology to perform the optimizations to ensure stable operation of ASL.
Second, we investigated reliable design of STT-MRAM bit-cells and addressed the conflicting read and write requirements, which results in overdesign of the bit-cells. Further, a Device/Circuit/Architecture co-design framework was developed to optimize the STT-MRAM devices by exploring the design space through jointly considering yield enhancement techniques at different levels of abstraction.
Recent advancements in the development of memristive devices have opened new opportunities for hardware implementation of non-Boolean computing. To this end, the suitability of memristive devices for swarm intelligence algorithms has enabled researchers to solve a maze in hardware. In this research, we utilized swarm intelligence of memristive networks to perform image edge detection. First, we proposed a hardware-friendly algorithm for image edge detection based on ant colony. Next, we designed the image edge detection algorithm using memristive networks
Non-Volatile Memory Adaptation in Asynchronous Microcontroller for Low Leakage Power and Fast Turn-on Time
This dissertation presents an MSP430 microcontroller implementation using Multi-Threshold NULL Convention Logic (MTNCL) methodology combined with an asynchronous non-volatile magnetic random-access-memory (RAM) to achieve low leakage power and fast turn-on. This asynchronous non-volatile RAM is designed with a Spin-Transfer Torque (STT) memory device model and CMOS transistors in a 65 nm technology. A self-timed Quasi-Delay-Insensitive 1 KB STT RAM is designed with an MTNCL interface and handshaking protocol. A replica methodology is implemented to handle write operation completion detection for long state-switching delays of the STT memory device. The MTNCL MSP430 core is integrated with the STT RAM to create a fully asynchronous non-volatile microcontroller.
The MSP430 architecture, the MTNCL design methodology, and the STT RAM’s low power property, along with STT RAM’s non-volatility yield multiple advantages in the MTNCL-STT RAM system for a variety of applications. For comparison, a baseline system with the same MTNCL core combined with an asynchronous CMOS RAM is designed and tested. Schematic simulation results demonstrate that the MTNCL-CMOS RAM system presents advantages in execution time and active energy over the MTNCL-STT RAM system; however, the MTNCL-STT RAM system presents unmatched advantages such as negligible leakage power, zero overhead memory power failure handling, and fast system turn-on
Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC
Fault tolerance is one of the major design goals for HPC. The emergence of
non-volatile memories (NVM) provides a solution to build fault tolerant HPC.
Data in NVM-based main memory are not lost when the system crashes because of
the non-volatility nature of NVM. However, because of volatile caches, data
must be logged and explicitly flushed from caches into NVM to ensure
consistence and correctness before crashes, which can cause large runtime
overhead.
In this paper, we introduce an algorithm-based method to establish crash
consistence in NVM for HPC applications. We slightly extend application data
structures or sparsely flush cache blocks, which introduce ignorable runtime
overhead. Such extension or cache flushing allows us to use algorithm knowledge
to \textit{reason} data consistence or correct inconsistent data when the
application crashes. We demonstrate the effectiveness of our method for three
algorithms, including an iterative solver, dense matrix multiplication, and
Monte-Carlo simulation. Based on comprehensive performance evaluation on a
variety of test environments, we demonstrate that our approach has very small
runtime overhead (at most 8.2\% and less than 3\% in most cases), much smaller
than that of traditional checkpoint, while having the same or less
recomputation cost.Comment: 12 page
Magnetic domain walls : Types, processes and applications
Domain walls (DWs) in magnetic nanowires are promising candidates for a
variety of applications including Boolean/unconventional logic, memories,
in-memory computing as well as magnetic sensors and biomagnetic
implementations. They show rich physical behaviour and are controllable using a
number of methods including magnetic fields, charge and spin currents and
spin-orbit torques. In this review, we detail types of domain walls in
ferromagnetic nanowires and describe processes of manipulating their state. We
look at the state of the art of DW applications and give our take on the their
current status, technological feasibility and challenges.Comment: 32 pages, 25 figures, review pape
Accelerating Time Series Analysis via Processing using Non-Volatile Memories
Time Series Analysis (TSA) is a critical workload for consumer-facing
devices. Accelerating TSA is vital for many domains as it enables the
extraction of valuable information and predict future events. The
state-of-the-art algorithm in TSA is the subsequence Dynamic Time Warping
(sDTW) algorithm. However, sDTW's computation complexity increases
quadratically with the time series' length, resulting in two performance
implications. First, the amount of data parallelism available is significantly
higher than the small number of processing units enabled by commodity systems
(e.g., CPUs). Second, sDTW is bottlenecked by memory because it 1) has low
arithmetic intensity and 2) incurs a large memory footprint. To tackle these
two challenges, we leverage Processing-using-Memory (PuM) by performing in-situ
computation where data resides, using the memory cells. PuM provides a
promising solution to alleviate data movement bottlenecks and exposes immense
parallelism.
In this work, we present MATSA, the first MRAM-based Accelerator for Time
Series Analysis. The key idea is to exploit magneto-resistive memory crossbars
to enable energy-efficient and fast time series computation in memory. MATSA
provides the following key benefits: 1) it leverages high levels of parallelism
in the memory substrate by exploiting column-wise arithmetic operations, and 2)
it significantly reduces the data movement costs performing computation using
the memory cells. We evaluate three versions of MATSA to match the requirements
of different environments (e.g., embedded, desktop, or HPC computing) based on
MRAM technology trends. We perform a design space exploration and demonstrate
that our HPC version of MATSA can improve performance by 7.35x/6.15x/6.31x and
energy efficiency by 11.29x/4.21x/2.65x over server CPU, GPU and PNM
architectures, respectively
Recommended from our members
Shape-engineered ferromagnets and micromagnetic simulation techniques for spin-transfer-torque random access memory
Spin-transfer-torque random access memory (STTRAM) has received great attention as a prospective universal memory due to high speed read and write capabilities, scalability to smaller technology nodes and non-volatile data retention. Two major factors that could limit the performance of large scale STTRAM arrays are the high switching current and the stochastic switching behavior. In this work, possible routes to mitigate these issues have been explored and new techniques have been proposed to estimate the reliability of the write process. Large area of the selection transistor required to support high switching current impacts the bit storage density of an STTRAM memory array. To increase the bit storage density, a multi-state STTRAM cell employing a cross-shaped ferromagnet was proposed previously. Here, the spin-transfer-torque (STT) driven mag-netization dynamics of the cross-shaped ferromagnet is revisited. As a low power alternative, voltage controlled magnetic anisotropy (VCMA) based writing scheme is studied. Trade-offs and limitations of the VCMA-induced switching over STT are also discussed. In the next part of this dissertation, magnetic properties and magnetization process of epitaxial chromium telluride thin films have been studied. Presence of strong perpendicular magnetic anisotropy in this material makes it an attractive choice for device applications. In this work, anisotropy energies of chromium telluride thin films have been estimated from magnetization measurements. The magnetization reversal process is then studied using analytical models as well as micromagnetic simulations. The last part of this work focuses on the write error rates (WER) of STTRAM. The stochastic write process of STTRAM at finite temperatures gives rise to write errors when a bit fails to switch within the duration of the write pulse. Ultra-low WER on the scale of 10⁻⁹ or less are desired for practical applications. Micromagnetic simulations are required to capture spatially-incoherent magnetization dynamics inside a ferromagnet, which may effect the WER. In this work, using the techniques of rare event enhancement, reliable calculation of WERs to 10⁻⁹ is demonstrated while keeping the computational effort to a minimum. Employing rare-event-enhanced micromagnetic simulations, WERs of both perpendicular and in-plane STTRAM bits are calculated and effects of spatially-incoherent excitations on the WER slopes are discussed.Electrical and Computer Engineerin
- …