369 research outputs found

    Neuro-memristive Circuits for Edge Computing: A review

    Full text link
    The volume, veracity, variability, and velocity of data produced from the ever-increasing network of sensors connected to Internet pose challenges for power management, scalability, and sustainability of cloud computing infrastructure. Increasing the data processing capability of edge computing devices at lower power requirements can reduce several overheads for cloud computing solutions. This paper provides the review of neuromorphic CMOS-memristive architectures that can be integrated into edge computing devices. We discuss why the neuromorphic architectures are useful for edge devices and show the advantages, drawbacks and open problems in the field of neuro-memristive circuits for edge computing

    Memory Devices and A/D Interfaces: Design Trade-offs in Mixed-Signal Accelerators for Machine Learning Applications

    Get PDF
    This tutorial focuses on memory elements and analog/digital (A/D) interfaces used in mixed-signal accelerators for deep neural networks (DNNs) in machine learning (ML) applications. These very dedicated systems exploit analog in-memory computation (AiMC) of weights and input activations to accelerate the DNN algorithm. The co-optimization of the memory cell storing the weights with the peripheral circuits is mandatory for improving the performance metrics of the accelerator. In this tutorial, four memory devices for AiMC are reported and analyzed with their computation scheme, including the digital-to-analog converter (DAC). Moreover, we review analog-to-digital converters (ADCs) for the quantization of the AiMC results, focusing on the design trade-offs of the different topologies given by the context

    Efficient Deep Neural Network Accelerator Using Controlled Ferroelectric Domain Dynamics

    Full text link
    The current work reports an efficient deep neural network (DNN) accelerator where synaptic weight elements are controlled by ferroelectric domain dynamics. An integrated device-to-algorithm framework for benchmarking novel synaptic devices is used. In P(VDF-TrFE) based ferroelectric tunnel junctions, analog conductance states are measured using a custom pulsing protocol and associated custom circuits and array architectures for DNN training is simulated. Our results show precise control of polarization switching dynamics in multi-domain, polycrystalline ferroelectric thin films can produce considerable weight update linearity in metal-ferroelectric-semiconductor (MFS) tunnel junctions. Ultrafast switching and low junction current in these devices offer extremely energy efficient operation. Through an integrated platform of hardware development, characterization and modelling, we predict the available conductance range where linearity is expected under identical potentiating and depressing pulses for efficient DNN training and inference tasks. As an example, an analog crossbar based DNN accelerator with MFS junctions as synaptic weight elements showed ~ 93% training accuracy on large MNIST handwritten digit dataset while for cropped images, a 95% accuracy is achieved. One observed challenge is rather limited dynamic conductance range while operating under identical potentiating and depressing pulses below 1V. Investigation is underway for improving the dynamic conductance range without losing the weight update linearity

    Neuromorphic Computing with Deeply Scaled Ferroelectric FinFET in Presence of Process Variation, Device Aging and Flicker Noise

    Full text link
    This paper reports a comprehensive study on the applicability of ultra-scaled ferroelectric FinFETs with 6 nm thick hafnium zirconium oxide layer for neuromorphic computing in the presence of process variation, flicker noise, and device aging. An intricate study has been conducted about the impact of such variations on the inference accuracy of pre-trained neural networks consisting of analog, quaternary (2-bit/cell) and binary synapse. A pre-trained neural network with 97.5% inference accuracy on the MNIST dataset has been adopted as the baseline. Process variation, flicker noise, and device aging characterization have been performed and a statistical model has been developed to capture all these effects during neural network simulation. Extrapolated retention above 10 years have been achieved for binary read-out procedure. We have demonstrated that the impact of (1) retention degradation due to the oxide thickness scaling, (2) process variation, and (3) flicker noise can be abated in ferroelectric FinFET based binary neural networks, which exhibits superior performance over quaternary and analog neural network, amidst all variations. The performance of a neural network is the result of coalesced performance of device, architecture and algorithm. This research corroborates the applicability of deeply scaled ferroelectric FinFETs for non-von Neumann computing with proper combination of architecture and algorithm

    Efficient hardware implementations of bio-inspired networks

    Get PDF
    The human brain, with its massive computational capability and power efficiency in small form factor, continues to inspire the ultimate goal of building machines that can perform tasks without being explicitly programmed. In an effort to mimic the natural information processing paradigms observed in the brain, several neural network generations have been proposed over the years. Among the neural networks inspired by biology, second-generation Artificial or Deep Neural Networks (ANNs/DNNs) use memoryless neuron models and have shown unprecedented success surpassing humans in a wide variety of tasks. Unlike ANNs, third-generation Spiking Neural Networks (SNNs) closely mimic biological neurons by operating on discrete and sparse events in time called spikes, which are obtained by the time integration of previous inputs. Implementation of data-intensive neural network models on computers based on the von Neumann architecture is mainly limited by the continuous data transfer between the physically separated memory and processing units. Hence, non-von Neumann architectural solutions are essential for processing these memory-intensive bio-inspired neural networks in an energy-efficient manner. Among the non-von Neumann architectures, implementations employing non-volatile memory (NVM) devices are most promising due to their compact size and low operating power. However, it is non-trivial to integrate these nanoscale devices on conventional computational substrates due to their non-idealities, such as limited dynamic range, finite bit resolution, programming variability, etc. This dissertation demonstrates the architectural and algorithmic optimizations of implementing bio-inspired neural networks using emerging nanoscale devices. The first half of the dissertation focuses on the hardware acceleration of DNN implementations. A 4-layer stochastic DNN in a crossbar architecture with memristive devices at the cross point is analyzed for accelerating DNN training. This network is then used as a baseline to explore the impact of experimental memristive device behavior on network performance. Programming variability is found to have a critical role in determining network performance compared to other non-ideal characteristics of the devices. In addition, noise-resilient inference engines are demonstrated using stochastic memristive DNNs with 100 bits for stochastic encoding during inference and 10 bits for the expensive training. The second half of the dissertation focuses on a novel probabilistic framework for SNNs using the Generalized Linear Model (GLM) neurons for capturing neuronal behavior. This work demonstrates that probabilistic SNNs have comparable perform-ance against equivalent ANNs on two popular benchmarks - handwritten-digit classification and human activity recognition. Considering the potential of SNNs in energy-efficient implementations, a hardware accelerator for inference is proposed, termed as Spintronic Accelerator for Probabilistic SNNs (SpinAPS). The learning algorithm is optimized for a hardware friendly implementation and uses first-to-spike decoding scheme for low latency inference. With binary spintronic synapses and digital CMOS logic neurons for computations, SpinAPS achieves a performance improvement of 4x in terms of GSOPS/W/mm2^2 when compared to a conventional SRAM-based design. Collectively, this work demonstrates the potential of emerging memory technologies in building energy-efficient hardware architectures for deep and spiking neural networks. The design strategies adopted in this work can be extended to other spike and non-spike based systems for building embedded solutions having power/energy constraints

    Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

    Full text link
    Analog In-Memory Computing (AIMC) is a promising approach to reduce the latency and energy consumption of Deep Neural Network (DNN) inference and training. However, the noisy and non-linear device characteristics, and the non-ideal peripheral circuitry in AIMC chips, require adapting DNNs to be deployed on such hardware to achieve equivalent accuracy to digital computing. In this tutorial, we provide a deep dive into how such adaptations can be achieved and evaluated using the recently released IBM Analog Hardware Acceleration Kit (AIHWKit), freely available at https://github.com/IBM/aihwkit. The AIHWKit is a Python library that simulates inference and training of DNNs using AIMC. We present an in-depth description of the AIHWKit design, functionality, and best practices to properly perform inference and training. We also present an overview of the Analog AI Cloud Composer, that provides the benefits of using the AIHWKit simulation platform in a fully managed cloud setting. Finally, we show examples on how users can expand and customize AIHWKit for their own needs. This tutorial is accompanied by comprehensive Jupyter Notebook code examples that can be run using AIHWKit, which can be downloaded from https://github.com/IBM/aihwkit/tree/master/notebooks/tutorial
    corecore