258 research outputs found
A Novel Self-Reference Sensing Scheme for MLC MRAM
Magnetic random access memory (MRAM) is a promising nonvolatile memory technology targeted on on-chip or embedded applications. Storage density is one of the major design concerns of MRAM. In recent years, many researches have been performed to improve the storage density and enhance the scalability of MRAM, such as shrinking the size and switching energy of magnetic tunneling junction (MTJ) devices. Recently, a tri-bit cell (TBC) structure was proposed to enlarge the storage density of MRAM. The typical sensing scheme for TBC sensing is suffering from large sensing latency and limited margin. In this work, a new self-reference sensing scheme for the TBC MRAM cell was proposed based on its unique property referred as resistance levels ordering. Simulation results show that compared to conventional design, the proposed self-reference scheme achieves on average 61% saving on sensing latency while also demonstrating significantly enhanced tolerance to device parametric variations
Design of robust spin-transfer torque magnetic random access memories for ultralow power high performance on-chip cache applications
Spin-transfer torque magnetic random access memories (STT-MRAMs) based on magnetic tunnel junction (MTJ) has become the leading candidate for future universal memory technology due to its potential for low power, non-volatile, high speed and extremely good endurance. However, conflicting read and write requirements exist in STT-MRAM technology because the current path during read and write operations are the same. Read and write failures of STT-MRAMs are degraded further under process variations. The focus of this dissertation is to optimize the yield of STT- MRAMs under process variations by employing device-circuit-architecture co-design techniques. A devices-to-systems simulation framework was developed to evaluate the effectiveness of the techniques proposed in this dissertation. An optimization methodology for minimizing the failure probability of 1T-1MTJ STT-MRAM bit-cell by proper selection of bit-cell configuration and access transistor sizing is also proposed. A failure mitigation technique using assistsin 1T-1MTJ STT-MRAM bit-cells is also proposed and discussed. Assist techniques proposed in this dissertation to mitigate write failures either increase the amount of current available to switch the MTJ during write or decrease the required current to switch the MTJ. These techniques achieve significant reduction in bit-cell area and write power with minimal impact on bit-cell failure probability and read power. However, the proposed write assist techniques may be less effective in scaled STT-MRAM bit-cells. Furthermore, read failures need to be overcome and hence, read assist techniques are required. It has been experimentally demonstrated that a class of materials called multiferroics can enable manipulation of magnetization using electric fields via magnetoelectric effects. A read assist technique using an MTJ structure incorporating multiferroic materials is proposed and analyzed. It was found that it is very difficult to overcome the fundamental design issues with 1T-1MTJ STT-MRAM due to the two-terminal nature of the MTJ. Hence, multi-terminal MTJ structures consisting of complementary polarized pinned layers are proposed. Analysis of the proposed MTJ structures shows significant improvement in bit-cell failures. Finally, this dissertation explores two system-level applications enabled by STT-MRAMs, and shows that device-circuit-architecture co-design of STT-MRAMs is required to fully exploit its benefits
Reliable Low-Power High Performance Spintronic Memories
Moores Gesetz folgend, ist es der Chipindustrie in den letzten fĂĽnf Jahrzehnten gelungen, ein
explosionsartiges Wachstum zu erreichen. Dies hatte ebenso einen exponentiellen Anstieg der
Nachfrage von Speicherkomponenten zur Folge, was wiederum zu speicherlastigen Chips in
den heutigen Computersystemen fĂĽhrt. Allerdings stellen traditionelle on-Chip Speichertech-
nologien wie Static Random Access Memories (SRAMs), Dynamic Random Access Memories
(DRAMs) und Flip-Flops eine Herausforderung in Bezug auf Skalierbarkeit, Verlustleistung
und Zuverlässigkeit dar. Eben jene Herausforderungen und die überwältigende Nachfrage
nach höherer Performanz und Integrationsdichte des on-Chip Speichers motivieren Forscher,
nach neuen nichtflĂĽchtigen Speichertechnologien zu suchen. Aufkommende spintronische Spe-
ichertechnologien wie Spin Orbit Torque (SOT) und Spin Transfer Torque (STT) erhielten
in den letzten Jahren eine hohe Aufmerksamkeit, da sie eine Reihe an Vorteilen bieten. Dazu
gehören Nichtflüchtigkeit, Skalierbarkeit, hohe Beständigkeit, CMOS Kompatibilität und Unan-
fälligkeit gegenüber Soft-Errors. In der Spintronik repräsentiert der Spin eines Elektrons dessen
Information. Das Datum wird durch die Höhe des Widerstandes gespeichert, welche sich durch
das Anlegen eines polarisierten Stroms an das Speichermedium verändern lässt. Das Prob-
lem der statischen Leistung gehen die Speichergeräte sowohl durch deren verlustleistungsfreie
Eigenschaft, als auch durch ihr Standard- Aus/Sofort-Ein Verhalten an. Nichtsdestotrotz sind
noch andere Probleme, wie die hohe Zugriffslatenz und die Energieaufnahme zu lösen, bevor
sie eine verbreitete Anwendung finden können. Um diesen Problemen gerecht zu werden, sind
neue Computerparadigmen, -architekturen und -entwurfsphilosophien notwendig.
Die hohe Zugriffslatenz der Spintroniktechnologie ist auf eine vergleichsweise lange Schalt-
dauer zurĂĽckzufĂĽhren, welche die von konventionellem SRAM ĂĽbersteigt. Des Weiteren ist auf
Grund des stochastischen Schaltvorgangs der Speicherzelle und des Einflusses der Prozessvari-
ation ein nicht zu vernachlässigender Zeitraum dafür erforderlich. In diesem Zeitraum wird ein
konstanter Schreibstrom durch die Bitzelle geleitet, um den Schaltvorgang zu gewährleisten.
Dieser Vorgang verursacht eine hohe Energieaufnahme. FĂĽr die Leseoperation wird gleicher-
maßen ein beachtliches Zeitfenster benötigt, ebenfalls bedingt durch den Einfluss der Prozess-
variation. Dem gegenüber stehen diverse Zuverlässigkeitsprobleme. Dazu gehören unter An-
derem die Leseintereferenz und andere Degenerationspobleme, wie das des Time Dependent Di-
electric Breakdowns (TDDB). Diese Zuverlässigkeitsprobleme sind wiederum auf die benötigten
längeren Schaltzeiten zurückzuführen, welche in der Folge auch einen über längere Zeit an-
liegenden Lese- bzw. Schreibstrom implizieren. Es ist daher notwendig, sowohl die Energie, als
auch die Latenz zur Steigerung der Zuverlässigkeit zu reduzieren, um daraus einen potenziellen
Kandidaten fĂĽr ein on-Chip Speichersystem zu machen.
In dieser Dissertation werden wir Entwurfsstrategien vorstellen, welche das Ziel verfolgen,
die Herausforderungen des Cache-, Register- und Flip-Flop-Entwurfs anzugehen. Dies erre-
ichen wir unter Zuhilfenahme eines Cross-Layer Ansatzes. FĂĽr Caches entwickelten wir ver-
schiedene Ansätze auf Schaltkreisebene, welche sowohl auf der Speicherarchitekturebene, als
auch auf der Systemebene in Bezug auf Energieaufnahme, Performanzsteigerung und Zuver-
lässigkeitverbesserung evaluiert werden. Wir entwickeln eine Selbstabschalttechnik, sowohl für
die Lese-, als auch die Schreiboperation von Caches. Diese ist in der Lage, den Abschluss der
entsprechenden Operation dynamisch zu ermitteln. Nachdem der Abschluss erkannt wurde,
wird die Lese- bzw. Schreiboperation sofort gestoppt, um Energie zu sparen. Zusätzlich
limitiert die Selbstabschalttechnik die Dauer des Stromflusses durch die Speicherzelle, was
wiederum das Auftreten von TDDB und Leseinterferenz bei Schreib- bzw. Leseoperationen re-
duziert. Zur Verbesserung der Schreiblatenz heben wir den Schreibstrom an der Bitzelle an, um den magnetischen Schaltprozess zu beschleunigen. Um registerbankspezifische Anforderungen
zu berücksichtigen, haben wir zusätzlich eine Multiport-Speicherarchitektur entworfen, welche
eine einzigartige Eigenschaft der SOT-Zelle ausnutzt, um simultan Lese- und Schreiboperatio-
nen auszuführen. Es ist daher möglich Lese/Schreib- Konfilkte auf Bitzellen-Ebene zu lösen,
was sich wiederum in einer sehr viel einfacheren Multiport- Registerbankarchitektur nieder-
schlägt.
Zusätzlich zu den Speicheransätzen haben wir ebenfalls zwei Flip-Flop-Architekturen vorgestellt.
Die erste ist eine nichtflĂĽchtige non-Shadow Flip-Flop-Architektur, welche die Speicherzelle als
aktive Komponente nutzt. Dies ermöglicht das sofortige An- und Ausschalten der Versorgungss-
pannung und ist daher besonders gut fĂĽr aggressives Powergating geeignet. Alles in Allem zeigt
der vorgestellte Flip-Flop-Entwurf eine ähnliche Timing-Charakteristik wie die konventioneller
CMOS Flip-Flops auf. Jedoch erlaubt er zur selben Zeit eine signifikante Reduktion der statis-
chen Leistungsaufnahme im Vergleich zu nichtflĂĽchtigen Shadow- Flip-Flops. Die zweite ist eine
fehlertolerante Flip-Flop-Architektur, welche sich unanfällig gegenüber diversen Defekten und
Fehlern verhält. Die Leistungsfähigkeit aller vorgestellten Techniken wird durch ausführliche
Simulationen auf Schaltkreisebene verdeutlicht, welche weiter durch detaillierte Evaluationen
auf Systemebene untermauert werden. Im Allgemeinen konnten wir verschiedene Techniken en-
twickeln, die erhebliche Verbesserungen in Bezug auf Performanz, Energie und Zuverlässigkeit
von spintronischen on-Chip Speichern, wie Caches, Register und Flip-Flops erreichen
STT-MRAM characterization and its test implications
Spin torque transfer (STT)-magnetoresistive random-access memory (MRAM) has come
a long way in research to meet the speed and power consumption requirements for future
memory applications. The state-of-the-art STT-MRAM bit-cells employ magnetic tunnel
junction (MTJ) with perpendicular magnetic anisotropy (PMA). The process repeatabil-
ity and yield stability for wafer fabrication are some of the critical issues encountered in
STT-MRAM mass production. Some of the yield improvement techniques to combat the
e ect of process variations have been previously explored. However, little research has been
done on defect oriented testing of STT-MRAM arrays. In this thesis, the author investi-
gates the parameter deviation and non-idealities encountered during the development of
a novel MTJ stack con guration. The characterization result provides motivation for the
development of the design for testability (DFT) scheme that can help test and characterize
STT-MRAM bit-cells and the CMOS peripheral circuitry e ciently.
The primary factors for wafer yield degradation are the device parameter variation and
its non-uniformity across the wafer due to the fabrication process non-idealities. There-
fore, e ective in-process testing strategies for exploring and verifying the impact of the
parameter variation on the wafer yield will be needed to achieve fabrication process opti-
mization. While yield depends on the CMOS process variability, quality of the deposited
MTJ lm, and other process non-idealities, test platform can enable parametric optimiza-
tion and veri cation using the CMOS-based DFT circuits. In this work, we develop a DFT
algorithm and implement a DFT circuit for parametric testing and prequali cation of the
critical circuits in the CMOS wafer. The DFT circuit successfully replicates the electrical
characteristics of MTJ devices and captures their spatial variation across the wafer with
an error of less than 4%. We estimate the yield of the read sensing path by implement-
ing the DFT circuit, which can replicate the resistance-area product variation up to 50%
from its nominal value. The yield data from the read sensing path at di erent wafer loca-
tions are analyzed, and a usable wafer radius has been estimated. Our DFT scheme can
provide quantitative feedback based on in-die measurement, enabling fabrication process
optimization through iterative estimation and veri cation of the calibrated parameters.
Another concern that prevents mass production of STT-MRAM arrays is the defect
formation in MTJ devices due to aging. Identifying manufacturing defects in the magnetic
tunnel junction (MTJ) device is crucial for the yield and reliability of spin-torque-transfer
(STT) magnetic random-access memory (MRAM) arrays. Several of the MTJ defects result
in parametric deviations of the device that deteriorate over time. We extend our work on
the DFT scheme by monitoring the electrical parameter deviations occurring due to the
defect formation over time. A programmable DFT scheme was implemented for a sub-arrayin 65 nm CMOS technology to evaluate the feasibility of the test scheme. The scheme utilizes the read sense path to compare the bit-cell electrical parameters against known
DFT cells characteristics. Built-in-self-test (BIST) methodology is utilized to trigger the
onset of the fault once the device parameter crosses a threshold value. We demonstrate
the operation and evaluate the accuracy of detection with the proposed scheme. The
DFT scheme can be exploited for monitoring aging defects, modeling their behavior and
optimization of the fabrication process.
DFT scheme could potentially nd numerous applications for parametric characteriza-
tion and fault monitoring of STT-MRAM bit-cell arrays during mass production. Some of
the applications include a) Fabrication process feedback to improve wafer turnaround time,
b) STT-MRAM bit-cell health monitoring, c) Decoupled characterization of the CMOS pe-
ripheral circuitry such as read-sensing path and sense ampli er characterization within the
STT-MRAM array. Additionally, the DFT scheme has potential applications for detec-
tion of fault formation that could be utilized for deploying redundancy schemes, providing
a graceful degradation in MTJ-based bit-cell array due to aging of the device, and also
providing feedback to improve the fabrication process and yield learning
MFPA: Mixed-Signal Field Programmable Array for Energy-Aware Compressive Signal Processing
Compressive Sensing (CS) is a signal processing technique which reduces the number of samples taken per frame to decrease energy, storage, and data transmission overheads, as well as reducing time taken for data acquisition in time-critical applications. The tradeoff in such an approach is increased complexity of signal reconstruction. While several algorithms have been developed for CS signal reconstruction, hardware implementation of these algorithms is still an area of active research. Prior work has sought to utilize parallelism available in reconstruction algorithms to minimize hardware overheads; however, such approaches are limited by the underlying limitations in CMOS technology. Herein, the MFPA (Mixed-signal Field Programmable Array) approach is presented as a hybrid spin-CMOS reconfigurable fabric specifically designed for implementation of CS data sampling and signal reconstruction. The resulting fabric consists of 1) slice-organized analog blocks providing amplifiers, transistors, capacitors, and Magnetic Tunnel Junctions (MTJs) which are configurable to achieving square/square root operations required for calculating vector norms, 2) digital functional blocks which feature 6-input clockless lookup tables for computation of matrix inverse, and 3) an MRAM-based nonvolatile crossbar array for carrying out low-energy matrix-vector multiplication operations. The various functional blocks are connected via a global interconnect and spin-based analog-to-digital converters. Simulation results demonstrate significant energy and area benefits compared to equivalent CMOS digital implementations for each of the functional blocks used: this includes an 80% reduction in energy and 97% reduction in transistor count for the nonvolatile crossbar array, 80% standby power reduction and 25% reduced area footprint for the clockless lookup tables, and roughly 97% reduction in transistor count for a multiplier built using components from the analog blocks. Moreover, the proposed fabric yields 77% energy reduction compared to CMOS when used to implement CS reconstruction, in addition to latency improvements
Emerging physical unclonable functions with nanotechnology
Physical unclonable functions (PUFs) are increasingly used for authentication and identification applications as well as the cryptographic key generation. An important feature of a PUF is the reliance on minute random variations in the fabricated hardware to derive a trusted random key. Currently, most PUF designs focus on exploiting process variations intrinsic to the CMOS technology. In recent years, progress in emerging nanoelectronic devices has demonstrated an increase in variation as a consequence of scaling down to the nanoregion. To date, emerging PUFs with nanotechnology have not been fully established, but they are expected to emerge. Initial research in this area aims to provide security primitives for emerging integrated circuits with nanotechnology. In this paper, we review emerging nanotechnology-based PUFs
FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks
Convolutional Neural Networks (CNNs) demonstrate excellent performance in
various applications but have high computational complexity. Quantization is
applied to reduce the latency and storage cost of CNNs. Among the quantization
methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique
advantage over 8-bit and 4-bit quantization. They replace the multiplication
operations in CNNs with additions, which are favoured on In-Memory-Computing
(IMC) devices. IMC acceleration for BWNs has been widely studied. However,
though TWNs have higher accuracy and better sparsity than BWNs, IMC
acceleration for TWNs has limited research. TWNs on existing IMC devices are
inefficient because the sparsity is not well utilized, and the addition
operation is not efficient.
In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we
propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to
skip the null operations on zero weights. Second, we propose a fast addition
scheme based on the memory Sense Amplifier to avoid the time overhead of both
carry propagation and writing back the carry to memory cells. Third, we further
propose a Combined-Stationary data mapping to reduce the data movement of
activations and weights and increase the parallelism across memory columns.
Simulation results show that for addition operations at the Sense Amplifier
level, FAT achieves 2.00X speedup, 1.22X power efficiency, and 1.22X area
efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT
achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on
networks with 80% average sparsity.Comment: 14 page
- …