292 research outputs found
Bio-inspired learning and hardware acceleration with emerging memories
Machine Learning has permeated many aspects of engineering, ranging from the Internet of Things (IoT) applications to big data analytics. While computing resources available to implement these algorithms have become more powerful, both in terms of the complexity of problems that can be solved and the overall computing speed, the huge energy costs involved remains a significant challenge. The human brain, which has evolved over millions of years, is widely accepted as the most efficient control and cognitive processing platform. Neuro-biological studies have established that information processing in the human brain relies on impulse like signals emitted by neurons called action potentials. Motivated by these facts, the Spiking Neural Networks (SNNs), which are a bio-plausible version of neural networks have been proposed as an alternative computing paradigm where the timing of spikes generated by artificial neurons is central to its learning and inference capabilities. This dissertation demonstrates the computational power of the SNNs using conventional CMOS and emerging nanoscale hardware platforms.
The first half of this dissertation presents an SNN architecture which is trained using a supervised spike-based learning algorithm for the handwritten digit classification problem. This network achieves an accuracy of 98.17% on the MNIST test data-set, with about 4X fewer parameters compared to the state-of-the-art neural networks achieving over 99% accuracy. In addition, a scheme for parallelizing and speeding up the SNN simulation on a GPU platform is presented. The second half of this dissertation presents an optimal hardware design for accelerating SNN inference and training with SRAM (Static Random Access Memory) and nanoscale non-volatile memory (NVM) crossbar arrays. Three prominent NVM devices are studied for realizing hardware accelerators for SNNs: Phase Change Memory (PCM), Spin Transfer Torque RAM (STT-RAM) and Resistive RAM (RRAM). The analysis shows that a spike-based inference engine with crossbar arrays of STT-RAM bit-cells is 2X and 5X more efficient compared to PCM and RRAM memories, respectively. Furthermore, the STT-RAM design has nearly 6X higher throughput per unit Watt per unit area than that of an equivalent SRAM-based (Static Random Access Memory) design. A hardware accelerator with on-chip learning on an STT-RAM memory array is also designed, requiring bits of floating-point synaptic weight precision to reach the baseline SNN algorithmic performance on the MNIST dataset. The complete design with STT-RAM crossbar array achieves nearly 20X higher throughput per unit Watt per unit mm^2 than an equivalent design with SRAM memory.
In summary, this work demonstrates the potential of spike-based neuromorphic computing algorithms and its efficient realization in hardware based on conventional CMOS as well as emerging technologies. The schemes presented here can be further extended to design spike-based systems that can be ubiquitously deployed for energy and memory constrained edge computing applications
Computing with Spintronics: Circuits and architectures
This thesis makes the following contributions towards the design of computing platforms with spintronic devices. 1) It explores the use of spintronic memories in the design of a domain-specific processor for an emerging class of data-intensive applications, namely recognition, mining and synthesis (RMS). Two different spintronic memory technologies — Domain Wall Memory (DWM) and STT-MRAM — are utilized to realize the different levels in the memory hierarchy of the domain-specific processor, based on their respective access characteristics. Architectural tradeoffs created by the use of spintronic memories are analyzed. The proposed design achieves 1.5X-4X improvements in energy-delay product compared to a CMOS baseline. 2) It describes the first attempt to use DWM in the cache hierarchy of general-purpose processors. DWM promises unparalleled density by packing several bits of data into each bit-cell. TapeCache, the proposed DWM-based cache architecture, utilizes suitable circuit and architectural optimizations to address two key challenges (i) the high energy and latency requirement of write operations and (ii) the need for shift operations to access the data stored in each DWM bit-cell. At the circuit level, DWM bit-cells that are tailored to the distinct design requirements of different levels in the cache hierarchy are proposed. At the architecture level, TapeCache proposes suitable cache organization and management policies to alleviate the performance impact of shift operations required to access data stored in DWM bit-cells. TapeCache achieves more than 7X improvements in both cache area and energy with virtually identical performance compared to an SRAM-based cache hierarchy. 3) It investigates the design of the on-chip memory hierarchy of general-purpose graphics processing units (GPGPUs)—massively parallel processors that are optimized for data-intensive high-throughput workloads—using DWM. STAG, a high density, energy-efficient Spintronic- Tape Architecture for GPGPU cache hierarchies is described. STAG utilizes different DWM bit-cells to realize different memory arrays in the GPGPU cache hierarchy. To address the challenge of high access latencies due to shifts, STAG predicts upcoming cache accesses by leveraging unique characteristics of GPGPU architectures and workloads, and prefetches data that are both likely to be accessed and require large numbers of shift operations. STAG achieves 3.3X energy reduction and 12.1% performance improvement over CMOS SRAM under iso-area conditions. 4) While the potential of spintronic devices for memories is widely recognized, their utility in realizing logic is much less clear. The thesis presents Spintastic, a new paradigm that utilizes Stochastic Computing (SC) to realize spintronic logic. In SC, data is encoded in the form of pseudo-random bitstreams, such that the probability of a \u271\u27 in a bitstream corresponds to the numerical value that it represents. SC can enable compact, low-complexity logic implementations of various arithmetic functions. Spintastic establishes the synergy between stochastic computing and spin-based logic by demonstrating that they mutually alleviate each other\u27s limitations. On the one hand, various building blocks of SC, which incur significant overheads in CMOS implementations, can be efficiently realized by exploiting the physical characteristics of spin devices. On the other hand, the reduced logic complexity and low logic depth of SC circuits alleviates the shortcomings of spintronic logic. Based on this insight, the design of spin-based stochastic arithmetic circuits, bitstream generators, bitstream permuters and stochastic-to-binary converter circuits are presented. Spintastic achieves 7.1X energy reduction over CMOS implementations for a wide range of benchmarks from the image processing, signal processing, and RMS application domains. 5) In order to evaluate the proposed spintronic designs, the thesis describes various device-to-architecture modeling frameworks. Starting with devices models that are calibrated to measurements, the characteristics of spintronic devices are successively abstracted into circuit-level and architectural models, which are incorporated into suitable simulation frameworks. (Abstract shortened by UMI.
Spiking Dynamics in Dual Free Layer Perpendicular Magnetic Tunnel Junctions
Spintronic devices have recently attracted a lot of attention in the field of
unconventional computing due to their non-volatility for short and long term
memory, non-linear fast response and relatively small footprint. Here we report
how voltage driven magnetization dynamics of dual free layer perpendicular
magnetic tunnel junctions enable to emulate spiking neurons in hardware. The
output spiking rate was controlled by varying the dc bias voltage across the
device. The field-free operation of this two terminal device and its robustness
against an externally applied magnetic field make it a suitable candidate to
mimic neuron response in a dense Neural Network (NN). The small energy
consumption of the device (4-16 pJ/spike) and its scalability are important
benefits for embedded applications. This compact perpendicular magnetic tunnel
junction structure could finally bring spiking neural networks (SNN) to
sub-100nm size elements
Stochastic Memory Devices for Security and Computing
With the widespread use of mobile computing and internet of things, secured communication and chip authentication have become extremely important. Hardware-based security concepts generally provide the best performance in terms of a good standard of security, low power consumption, and large-area density. In these concepts, the stochastic properties of nanoscale devices, such as the physical and geometrical variations of the process, are harnessed for true random number generators (TRNGs) and physical unclonable functions (PUFs). Emerging memory devices, such as resistive-switching memory (RRAM), phase-change memory (PCM), and spin-transfer torque magnetic memory (STT-MRAM), rely on a unique combination of physical mechanisms for transport and switching, thus appear to be an ideal source of entropy for TRNGs and PUFs. An overview of stochastic phenomena in memory devices and their use for developing security and computing primitives is provided. First, a broad classification of methods to generate true random numbers via the stochastic properties of nanoscale devices is presented. Then, practical implementations of stochastic TRNGs, such as hardware security and stochastic computing, are shown. Finally, future challenges to stochastic memory development are discussed
Recommended from our members
Probabilistic design for emerging memory and nanometer-scale logic
As semiconductor technology has scaled down, the impact of stochastic behavior in very large scale integrated circuits (VLSI) has become an ever-more important concern. This dissertation investigates two distinct classes of problems that require the use of probabilistic methods and models: (1) Modeling and exploiting stochastic behavior in advanced memory technologies; (2) Probabilistic modeling of faults due to on-chip voltage variation.
This dissertation first investigates the unique physics-level stochasticity of spin-transfer torque magnetic RAM (STT-RAM). The write process of STT-RAM is stochastic: specifically, the write time of a bitcell varies significantly. The wors-tcase approach, which uses the longest write pulse duration, guarantees a successful write; however, it introduces significant energy overhead due to excessive margins since the average write pulse duration is far shorter than the worst-case pulse duration. This dissertation develops novel circuit techniques to exploit the stochastic properties of STT-RAM write operation for energy savings by moving away from the worst-case approach to dynamic strategies while maintaining the required low error rate. The first contribution is a variable energy write (VEW) architecture that effectively exploits the wide distribution of write time to greatly reduce energy via a mechanism that checks the instantaneous state of the bitcell and deactivates the write current once the correct value has registered. The second contribution is a multiple attempt write (MAW) strategy that utilizes the asymptotic temporal stochastic independence of repeated switching events to achieve a dramatic reduction in energy. The proposed architectures are evaluated using a compact STT-RAM cell model. Analysis indicates that VEW succeeded in reducing the write energy by 94.7% with approximately 1% relative area overhead under an efficient design methodology compared with the conventional designs relying on the worst case approach. MAW reduced the overall write energy by 94.6% with approximately 0.05% relative area overhead.
This dissertation then addresses the problem of probabilistic modeling of faults due to on-chip voltage variations. The power supply voltage variation can increase gate delay, resulting in timing faults on near-critical paths. These low-level faults ultimately propagate to architecture and application levels, often leading to critical system failures. Developing an accurate fault model and injection tool that generates and propagates faults from circuit- to gate-level is important for accurately predicting the resulting system failures. This is challenging since the model needs to accurately capture the physical characteristics at the circuit level that define the likelihood of a fault and use that information to guide the injection with the proper probability. At the same time, the analysis and fault injections need to be computationally manageable to allow analysis of realistic systems under realistic workloads. The conventional fault models rely on either Monte Carlo sampling or time-consuming runtime simulation using the worst-case voltage drop. To overcome simulation overheads of runtime circuit-level simulation, a novel two-phase approach is proposed. The main idea is that circuit characterization can be done before simulation. The result of pre-characterization is used at runtime via a form of look-up to enable gate-level efficiency. The two-phase methodology is time-efficient but may require high memory unless the look-up tables are carefully optimized. This dissertation also develops the fault probability estimation based on workload-specific voltage distribution, rather than a fixed worst-case voltage. The proposed methodology is implemented on an OpenSPARC design targeting on a 32nm technology node. Analysis indicates the proposed fault modeling and injection flow reduces runtime overhead by 24X compared to the previously best-known gate-level fault simulator while having circuit level accuracy.Electrical and Computer Engineerin
- …