175 research outputs found
Reconfigurable nanoelectronics using graphene based spintronic logic gates
This paper presents a novel design concept for spintronic nanoelectronics
that emphasizes a seamless integration of spin-based memory and logic circuits.
The building blocks are magneto-logic gates based on a hybrid
graphene/ferromagnet material system. We use network search engines as a
technology demonstration vehicle and present a spin-based circuit design with
smaller area, faster speed, and lower energy consumption than the
state-of-the-art CMOS counterparts. This design can also be applied in
applications such as data compression, coding and image recognition. In the
proposed scheme, over 100 spin-based logic operations are carried out before
any need for a spin-charge conversion. Consequently, supporting CMOS
electronics requires little power consumption. The spintronic-CMOS integrated
system can be implemented on a single 3-D chip. These nonvolatile logic
circuits hold potential for a paradigm shift in computing applications.Comment: 14 pages (single column), 6 figure
Energy efficient hybrid computing systems using spin devices
Emerging spin-devices like magnetic tunnel junctions (MTJ\u27s), spin-valves and domain wall magnets (DWM) have opened new avenues for spin-based logic design. This work explored potential computing applications which can exploit such devices for higher energy-efficiency and performance. The proposed applications involve hybrid design schemes, where charge-based devices supplement the spin-devices, to gain large benefits at the system level. As an example, lateral spin valves (LSV) involve switching of nanomagnets using spin-polarized current injection through a metallic channel such as Cu. Such spin-torque based devices possess several interesting properties that can be exploited for ultra-low power computation. Analog characteristic of spin current facilitate non-Boolean computation like majority evaluation that can be used to model a neuron. The magneto-metallic neurons can operate at ultra-low terminal voltage of âŒ20mV, thereby resulting in small computation power. Moreover, since nano-magnets inherently act as memory elements, these devices can facilitate integration of logic and memory in interesting ways. The spin based neurons can be integrated with CMOS and other emerging devices leading to different classes of neuromorphic/non-Von-Neumann architectures. The spin-based designs involve `mixed-mode\u27 processing and hence can provide very compact and ultra-low energy solutions for complex computation blocks, both digital as well as analog. Such low-power, hybrid designs can be suitable for various data processing applications like cognitive computing, associative memory, and currentmode on-chip global interconnects. Simulation results for these applications based on device-circuit co-simulation framework predict more than âŒ100x improvement in computation energy as compared to state of the art CMOS design, for optimal spin-device parameters
FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks
Convolutional Neural Networks (CNNs) demonstrate excellent performance in
various applications but have high computational complexity. Quantization is
applied to reduce the latency and storage cost of CNNs. Among the quantization
methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique
advantage over 8-bit and 4-bit quantization. They replace the multiplication
operations in CNNs with additions, which are favoured on In-Memory-Computing
(IMC) devices. IMC acceleration for BWNs has been widely studied. However,
though TWNs have higher accuracy and better sparsity than BWNs, IMC
acceleration for TWNs has limited research. TWNs on existing IMC devices are
inefficient because the sparsity is not well utilized, and the addition
operation is not efficient.
In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we
propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to
skip the null operations on zero weights. Second, we propose a fast addition
scheme based on the memory Sense Amplifier to avoid the time overhead of both
carry propagation and writing back the carry to memory cells. Third, we further
propose a Combined-Stationary data mapping to reduce the data movement of
activations and weights and increase the parallelism across memory columns.
Simulation results show that for addition operations at the Sense Amplifier
level, FAT achieves 2.00X speedup, 1.22X power efficiency, and 1.22X area
efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT
achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on
networks with 80% average sparsity.Comment: 14 page
Design of robust spin-transfer torque magnetic random access memories for ultralow power high performance on-chip cache applications
Spin-transfer torque magnetic random access memories (STT-MRAMs) based on magnetic tunnel junction (MTJ) has become the leading candidate for future universal memory technology due to its potential for low power, non-volatile, high speed and extremely good endurance. However, conflicting read and write requirements exist in STT-MRAM technology because the current path during read and write operations are the same. Read and write failures of STT-MRAMs are degraded further under process variations. The focus of this dissertation is to optimize the yield of STT- MRAMs under process variations by employing device-circuit-architecture co-design techniques. A devices-to-systems simulation framework was developed to evaluate the effectiveness of the techniques proposed in this dissertation. An optimization methodology for minimizing the failure probability of 1T-1MTJ STT-MRAM bit-cell by proper selection of bit-cell configuration and access transistor sizing is also proposed. A failure mitigation technique using assistsin 1T-1MTJ STT-MRAM bit-cells is also proposed and discussed. Assist techniques proposed in this dissertation to mitigate write failures either increase the amount of current available to switch the MTJ during write or decrease the required current to switch the MTJ. These techniques achieve significant reduction in bit-cell area and write power with minimal impact on bit-cell failure probability and read power. However, the proposed write assist techniques may be less effective in scaled STT-MRAM bit-cells. Furthermore, read failures need to be overcome and hence, read assist techniques are required. It has been experimentally demonstrated that a class of materials called multiferroics can enable manipulation of magnetization using electric fields via magnetoelectric effects. A read assist technique using an MTJ structure incorporating multiferroic materials is proposed and analyzed. It was found that it is very difficult to overcome the fundamental design issues with 1T-1MTJ STT-MRAM due to the two-terminal nature of the MTJ. Hence, multi-terminal MTJ structures consisting of complementary polarized pinned layers are proposed. Analysis of the proposed MTJ structures shows significant improvement in bit-cell failures. Finally, this dissertation explores two system-level applications enabled by STT-MRAMs, and shows that device-circuit-architecture co-design of STT-MRAMs is required to fully exploit its benefits
Reliable Low-Power High Performance Spintronic Memories
Moores Gesetz folgend, ist es der Chipindustrie in den letzten fĂŒnf Jahrzehnten gelungen, ein
explosionsartiges Wachstum zu erreichen. Dies hatte ebenso einen exponentiellen Anstieg der
Nachfrage von Speicherkomponenten zur Folge, was wiederum zu speicherlastigen Chips in
den heutigen Computersystemen fĂŒhrt. Allerdings stellen traditionelle on-Chip Speichertech-
nologien wie Static Random Access Memories (SRAMs), Dynamic Random Access Memories
(DRAMs) und Flip-Flops eine Herausforderung in Bezug auf Skalierbarkeit, Verlustleistung
und ZuverlĂ€ssigkeit dar. Eben jene Herausforderungen und die ĂŒberwĂ€ltigende Nachfrage
nach höherer Performanz und Integrationsdichte des on-Chip Speichers motivieren Forscher,
nach neuen nichtflĂŒchtigen Speichertechnologien zu suchen. Aufkommende spintronische Spe-
ichertechnologien wie Spin Orbit Torque (SOT) und Spin Transfer Torque (STT) erhielten
in den letzten Jahren eine hohe Aufmerksamkeit, da sie eine Reihe an Vorteilen bieten. Dazu
gehören NichtflĂŒchtigkeit, Skalierbarkeit, hohe BestĂ€ndigkeit, CMOS KompatibilitĂ€t und Unan-
fĂ€lligkeit gegenĂŒber Soft-Errors. In der Spintronik reprĂ€sentiert der Spin eines Elektrons dessen
Information. Das Datum wird durch die Höhe des Widerstandes gespeichert, welche sich durch
das Anlegen eines polarisierten Stroms an das Speichermedium verÀndern lÀsst. Das Prob-
lem der statischen Leistung gehen die SpeichergerÀte sowohl durch deren verlustleistungsfreie
Eigenschaft, als auch durch ihr Standard- Aus/Sofort-Ein Verhalten an. Nichtsdestotrotz sind
noch andere Probleme, wie die hohe Zugriffslatenz und die Energieaufnahme zu lösen, bevor
sie eine verbreitete Anwendung finden können. Um diesen Problemen gerecht zu werden, sind
neue Computerparadigmen, -architekturen und -entwurfsphilosophien notwendig.
Die hohe Zugriffslatenz der Spintroniktechnologie ist auf eine vergleichsweise lange Schalt-
dauer zurĂŒckzufĂŒhren, welche die von konventionellem SRAM ĂŒbersteigt. Des Weiteren ist auf
Grund des stochastischen Schaltvorgangs der Speicherzelle und des Einflusses der Prozessvari-
ation ein nicht zu vernachlĂ€ssigender Zeitraum dafĂŒr erforderlich. In diesem Zeitraum wird ein
konstanter Schreibstrom durch die Bitzelle geleitet, um den Schaltvorgang zu gewÀhrleisten.
Dieser Vorgang verursacht eine hohe Energieaufnahme. FĂŒr die Leseoperation wird gleicher-
maĂen ein beachtliches Zeitfenster benötigt, ebenfalls bedingt durch den Einfluss der Prozess-
variation. Dem gegenĂŒber stehen diverse ZuverlĂ€ssigkeitsprobleme. Dazu gehören unter An-
derem die Leseintereferenz und andere Degenerationspobleme, wie das des Time Dependent Di-
electric Breakdowns (TDDB). Diese ZuverlÀssigkeitsprobleme sind wiederum auf die benötigten
lĂ€ngeren Schaltzeiten zurĂŒckzufĂŒhren, welche in der Folge auch einen ĂŒber lĂ€ngere Zeit an-
liegenden Lese- bzw. Schreibstrom implizieren. Es ist daher notwendig, sowohl die Energie, als
auch die Latenz zur Steigerung der ZuverlÀssigkeit zu reduzieren, um daraus einen potenziellen
Kandidaten fĂŒr ein on-Chip Speichersystem zu machen.
In dieser Dissertation werden wir Entwurfsstrategien vorstellen, welche das Ziel verfolgen,
die Herausforderungen des Cache-, Register- und Flip-Flop-Entwurfs anzugehen. Dies erre-
ichen wir unter Zuhilfenahme eines Cross-Layer Ansatzes. FĂŒr Caches entwickelten wir ver-
schiedene AnsÀtze auf Schaltkreisebene, welche sowohl auf der Speicherarchitekturebene, als
auch auf der Systemebene in Bezug auf Energieaufnahme, Performanzsteigerung und Zuver-
lĂ€ssigkeitverbesserung evaluiert werden. Wir entwickeln eine Selbstabschalttechnik, sowohl fĂŒr
die Lese-, als auch die Schreiboperation von Caches. Diese ist in der Lage, den Abschluss der
entsprechenden Operation dynamisch zu ermitteln. Nachdem der Abschluss erkannt wurde,
wird die Lese- bzw. Schreiboperation sofort gestoppt, um Energie zu sparen. ZusÀtzlich
limitiert die Selbstabschalttechnik die Dauer des Stromflusses durch die Speicherzelle, was
wiederum das Auftreten von TDDB und Leseinterferenz bei Schreib- bzw. Leseoperationen re-
duziert. Zur Verbesserung der Schreiblatenz heben wir den Schreibstrom an der Bitzelle an, um den magnetischen Schaltprozess zu beschleunigen. Um registerbankspezifische Anforderungen
zu berĂŒcksichtigen, haben wir zusĂ€tzlich eine Multiport-Speicherarchitektur entworfen, welche
eine einzigartige Eigenschaft der SOT-Zelle ausnutzt, um simultan Lese- und Schreiboperatio-
nen auszufĂŒhren. Es ist daher möglich Lese/Schreib- Konfilkte auf Bitzellen-Ebene zu lösen,
was sich wiederum in einer sehr viel einfacheren Multiport- Registerbankarchitektur nieder-
schlÀgt.
ZusÀtzlich zu den SpeicheransÀtzen haben wir ebenfalls zwei Flip-Flop-Architekturen vorgestellt.
Die erste ist eine nichtflĂŒchtige non-Shadow Flip-Flop-Architektur, welche die Speicherzelle als
aktive Komponente nutzt. Dies ermöglicht das sofortige An- und Ausschalten der Versorgungss-
pannung und ist daher besonders gut fĂŒr aggressives Powergating geeignet. Alles in Allem zeigt
der vorgestellte Flip-Flop-Entwurf eine Àhnliche Timing-Charakteristik wie die konventioneller
CMOS Flip-Flops auf. Jedoch erlaubt er zur selben Zeit eine signifikante Reduktion der statis-
chen Leistungsaufnahme im Vergleich zu nichtflĂŒchtigen Shadow- Flip-Flops. Die zweite ist eine
fehlertolerante Flip-Flop-Architektur, welche sich unanfĂ€llig gegenĂŒber diversen Defekten und
Fehlern verhĂ€lt. Die LeistungsfĂ€higkeit aller vorgestellten Techniken wird durch ausfĂŒhrliche
Simulationen auf Schaltkreisebene verdeutlicht, welche weiter durch detaillierte Evaluationen
auf Systemebene untermauert werden. Im Allgemeinen konnten wir verschiedene Techniken en-
twickeln, die erhebliche Verbesserungen in Bezug auf Performanz, Energie und ZuverlÀssigkeit
von spintronischen on-Chip Speichern, wie Caches, Register und Flip-Flops erreichen
Computing with Spintronics: Circuits and architectures
This thesis makes the following contributions towards the design of computing platforms with spintronic devices. 1) It explores the use of spintronic memories in the design of a domain-specific processor for an emerging class of data-intensive applications, namely recognition, mining and synthesis (RMS). Two different spintronic memory technologies â Domain Wall Memory (DWM) and STT-MRAM â are utilized to realize the different levels in the memory hierarchy of the domain-specific processor, based on their respective access characteristics. Architectural tradeoffs created by the use of spintronic memories are analyzed. The proposed design achieves 1.5X-4X improvements in energy-delay product compared to a CMOS baseline. 2) It describes the first attempt to use DWM in the cache hierarchy of general-purpose processors. DWM promises unparalleled density by packing several bits of data into each bit-cell. TapeCache, the proposed DWM-based cache architecture, utilizes suitable circuit and architectural optimizations to address two key challenges (i) the high energy and latency requirement of write operations and (ii) the need for shift operations to access the data stored in each DWM bit-cell. At the circuit level, DWM bit-cells that are tailored to the distinct design requirements of different levels in the cache hierarchy are proposed. At the architecture level, TapeCache proposes suitable cache organization and management policies to alleviate the performance impact of shift operations required to access data stored in DWM bit-cells. TapeCache achieves more than 7X improvements in both cache area and energy with virtually identical performance compared to an SRAM-based cache hierarchy. 3) It investigates the design of the on-chip memory hierarchy of general-purpose graphics processing units (GPGPUs)âmassively parallel processors that are optimized for data-intensive high-throughput workloadsâusing DWM. STAG, a high density, energy-efficient Spintronic- Tape Architecture for GPGPU cache hierarchies is described. STAG utilizes different DWM bit-cells to realize different memory arrays in the GPGPU cache hierarchy. To address the challenge of high access latencies due to shifts, STAG predicts upcoming cache accesses by leveraging unique characteristics of GPGPU architectures and workloads, and prefetches data that are both likely to be accessed and require large numbers of shift operations. STAG achieves 3.3X energy reduction and 12.1% performance improvement over CMOS SRAM under iso-area conditions. 4) While the potential of spintronic devices for memories is widely recognized, their utility in realizing logic is much less clear. The thesis presents Spintastic, a new paradigm that utilizes Stochastic Computing (SC) to realize spintronic logic. In SC, data is encoded in the form of pseudo-random bitstreams, such that the probability of a \u271\u27 in a bitstream corresponds to the numerical value that it represents. SC can enable compact, low-complexity logic implementations of various arithmetic functions. Spintastic establishes the synergy between stochastic computing and spin-based logic by demonstrating that they mutually alleviate each other\u27s limitations. On the one hand, various building blocks of SC, which incur significant overheads in CMOS implementations, can be efficiently realized by exploiting the physical characteristics of spin devices. On the other hand, the reduced logic complexity and low logic depth of SC circuits alleviates the shortcomings of spintronic logic. Based on this insight, the design of spin-based stochastic arithmetic circuits, bitstream generators, bitstream permuters and stochastic-to-binary converter circuits are presented. Spintastic achieves 7.1X energy reduction over CMOS implementations for a wide range of benchmarks from the image processing, signal processing, and RMS application domains. 5) In order to evaluate the proposed spintronic designs, the thesis describes various device-to-architecture modeling frameworks. Starting with devices models that are calibrated to measurements, the characteristics of spintronic devices are successively abstracted into circuit-level and architectural models, which are incorporated into suitable simulation frameworks. (Abstract shortened by UMI.
Exploring Spin-transfer-torque devices and memristors for logic and memory applications
As scaling CMOS devices is approaching its physical limits, researchers have begun exploring newer devices and architectures to replace CMOS.
Due to their non-volatility and high density, Spin Transfer Torque (STT) devices are among the most prominent candidates for logic and memory applications. In this research, we first considered a new logic style called All Spin Logic (ASL). Despite its advantages, ASL consumes a large amount of static power; thus, several optimizations can be performed to address this issue. We developed a systematic methodology to perform the optimizations to ensure stable operation of ASL.
Second, we investigated reliable design of STT-MRAM bit-cells and addressed the conflicting read and write requirements, which results in overdesign of the bit-cells. Further, a Device/Circuit/Architecture co-design framework was developed to optimize the STT-MRAM devices by exploring the design space through jointly considering yield enhancement techniques at different levels of abstraction.
Recent advancements in the development of memristive devices have opened new opportunities for hardware implementation of non-Boolean computing. To this end, the suitability of memristive devices for swarm intelligence algorithms has enabled researchers to solve a maze in hardware. In this research, we utilized swarm intelligence of memristive networks to perform image edge detection. First, we proposed a hardware-friendly algorithm for image edge detection based on ant colony. Next, we designed the image edge detection algorithm using memristive networks
Beyond Moore's technologies: operation principles of a superconductor alternative
The predictions of Moore's law are considered by experts to be valid until
2020 giving rise to "post-Moore's" technologies afterwards. Energy efficiency
is one of the major challenges in high-performance computing that should be
answered. Superconductor digital technology is a promising post-Moore's
alternative for the development of supercomputers. In this paper, we consider
operation principles of an energy-efficient superconductor logic and memory
circuits with a short retrospective review of their evolution. We analyze their
shortcomings in respect to computer circuits design. Possible ways of further
research are outlined.Comment: OPEN ACCES
X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories
Silicon-based Static Random Access Memories (SRAM) and digital Boolean logic
have been the workhorse of the state-of-art computing platforms. Despite
tremendous strides in scaling the ubiquitous metal-oxide-semiconductor
transistor, the underlying \textit{von-Neumann} computing architecture has
remained unchanged. The limited throughput and energy-efficiency of the
state-of-art computing systems, to a large extent, results from the well-known
\textit{von-Neumann bottleneck}. The energy and throughput inefficiency of the
von-Neumann machines have been accentuated in recent times due to the present
emphasis on data-intensive applications like artificial intelligence, machine
learning \textit{etc}. A possible approach towards mitigating the overhead
associated with the von-Neumann bottleneck is to enable \textit{in-memory}
Boolean computations. In this manuscript, we present an augmented version of
the conventional SRAM bit-cells, called \textit{the X-SRAM}, with the ability
to perform in-memory, vector Boolean computations, in addition to the usual
memory storage operations. We propose at least six different schemes for
enabling in-memory vector computations including NAND, NOR, IMP (implication),
XOR logic gates with respect to different bit-cell topologies the 8T cell
and the 8T Differential cell. In addition, we also present a novel
\textit{`read-compute-store'} scheme, wherein the computed Boolean function can
be directly stored in the memory without the need of latching the data and
carrying out a subsequent write operation. The feasibility of the proposed
schemes has been verified using predictive transistor models and Monte-Carlo
variation analysis.Comment: This article has been accepted in a future issue of IEEE Transactions
on Circuits and Systems-I: Regular Paper
- âŠ