1,445 research outputs found

    Efficient Neuromorphic Computing Enabled by Spin-Transfer Torque: Devices, Circuits and Systems

    Get PDF
    Present day computers expend orders of magnitude more computational resources to perform various cognitive and perception related tasks that humans routinely perform everyday. This has recently resulted in a seismic shift in the field of computation where research efforts are being directed to develop a neurocomputer that attempts to mimic the human brain by nanoelectronic components and thereby harness its efficiency in recognition problems. Bridging the gap between neuroscience and nanoelectronics, this thesis demonstrates the encoding of biological neural and synaptic functionalities in the underlying physics of electron spin. Description of various spin-transfer torque mechanisms that can be potentially utilized for realizing neuro-mimetic device structures is provided. A cross-layer perspective extending from the device to the circuit and system level is presented to envision the design of an All-Spin neuromorphic processor enabled with on-chip learning functionalities. Device-circuit-algorithm co-simulation framework calibrated to experimental results suggest that such All-Spin neuromorphic systems can potentially achieve almost two orders of magnitude energy improvement in comparison to state-of-the-art CMOS implementations

    Quantized Non-Volatile Nanomagnetic Synapse based Autoencoder for Efficient Unsupervised Network Anomaly Detection

    Full text link
    In the autoencoder based anomaly detection paradigm, implementing the autoencoder in edge devices capable of learning in real-time is exceedingly challenging due to limited hardware, energy, and computational resources. We show that these limitations can be addressed by designing an autoencoder with low-resolution non-volatile memory-based synapses and employing an effective quantized neural network learning algorithm. We propose a ferromagnetic racetrack with engineered notches hosting a magnetic domain wall (DW) as the autoencoder synapses, where limited state (5-state) synaptic weights are manipulated by spin orbit torque (SOT) current pulses. The performance of anomaly detection of the proposed autoencoder model is evaluated on the NSL-KDD dataset. Limited resolution and DW device stochasticity aware training of the autoencoder is performed, which yields comparable anomaly detection performance to the autoencoder having floating-point precision weights. While the limited number of quantized states and the inherent stochastic nature of DW synaptic weights in nanoscale devices are known to negatively impact the performance, our hardware-aware training algorithm is shown to leverage these imperfect device characteristics to generate an improvement in anomaly detection accuracy (90.98%) compared to accuracy obtained with floating-point trained weights. Furthermore, our DW-based approach demonstrates a remarkable reduction of at least three orders of magnitude in weight updates during training compared to the floating-point approach, implying substantial energy savings for our method. This work could stimulate the development of extremely energy efficient non-volatile multi-state synapse-based processors that can perform real-time training and inference on the edge with unsupervised data

    Autonomous Probabilistic Coprocessing with Petaflips per Second

    Full text link
    In this paper we present a concrete design for a probabilistic (p-) computer based on a network of p-bits, robust classical entities fluctuating between -1 and +1, with probabilities that are controlled through an input constructed from the outputs of other p-bits. The architecture of this probabilistic computer is similar to a stochastic neural network with the p-bit playing the role of a binary stochastic neuron, but with one key difference: there is no sequencer used to enforce an ordering of p-bit updates, as is typically required. Instead, we explore \textit{sequencerless} designs where all p-bits are allowed to flip autonomously and demonstrate that such designs can allow ultrafast operation unconstrained by available clock speeds without compromising the solution's fidelity. Based on experimental results from a hardware benchmark of the autonomous design and benchmarked device models, we project that a nanomagnetic implementation can scale to achieve petaflips per second with millions of neurons. A key contribution of this paper is the focus on a hardware metric −- flips per second −- as a problem and substrate-independent figure-of-merit for an emerging class of hardware annealers known as Ising Machines. Much like the shrinking feature sizes of transistors that have continually driven Moore's Law, we believe that flips per second can be continually improved in later technology generations of a wide class of probabilistic, domain specific hardware.Comment: 13 pages, 8 figures, 1 tabl

    Algorithm and Hardware Co-design for Learning On-a-chip

    Get PDF
    abstract: Machine learning technology has made a lot of incredible achievements in recent years. It has rivalled or exceeded human performance in many intellectual tasks including image recognition, face detection and the Go game. Many machine learning algorithms require huge amount of computation such as in multiplication of large matrices. As silicon technology has scaled to sub-14nm regime, simply scaling down the device cannot provide enough speed-up any more. New device technologies and system architectures are needed to improve the computing capacity. Designing specific hardware for machine learning is highly in demand. Efforts need to be made on a joint design and optimization of both hardware and algorithm. For machine learning acceleration, traditional SRAM and DRAM based system suffer from low capacity, high latency, and high standby power. Instead, emerging memories, such as Phase Change Random Access Memory (PRAM), Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM), and Resistive Random Access Memory (RRAM), are promising candidates providing low standby power, high data density, fast access and excellent scalability. This dissertation proposes a hierarchical memory modeling framework and models PRAM and STT-MRAM in four different levels of abstraction. With the proposed models, various simulations are conducted to investigate the performance, optimization, variability, reliability, and scalability. Emerging memory devices such as RRAM can work as a 2-D crosspoint array to speed up the multiplication and accumulation in machine learning algorithms. This dissertation proposes a new parallel programming scheme to achieve in-memory learning with RRAM crosspoint array. The programming circuitry is designed and simulated in TSMC 65nm technology showing 900X speedup for the dictionary learning task compared to the CPU performance. From the algorithm perspective, inspired by the high accuracy and low power of the brain, this dissertation proposes a bio-plausible feedforward inhibition spiking neural network with Spike-Rate-Dependent-Plasticity (SRDP) learning rule. It achieves more than 95% accuracy on the MNIST dataset, which is comparable to the sparse coding algorithm, but requires far fewer number of computations. The role of inhibition in this network is systematically studied and shown to improve the hardware efficiency in learning.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
    • …
    corecore