369 research outputs found
Neuro-memristive Circuits for Edge Computing: A review
The volume, veracity, variability, and velocity of data produced from the
ever-increasing network of sensors connected to Internet pose challenges for
power management, scalability, and sustainability of cloud computing
infrastructure. Increasing the data processing capability of edge computing
devices at lower power requirements can reduce several overheads for cloud
computing solutions. This paper provides the review of neuromorphic
CMOS-memristive architectures that can be integrated into edge computing
devices. We discuss why the neuromorphic architectures are useful for edge
devices and show the advantages, drawbacks and open problems in the field of
neuro-memristive circuits for edge computing
Memory Devices and A/D Interfaces: Design Trade-offs in Mixed-Signal Accelerators for Machine Learning Applications
This tutorial focuses on memory elements and analog/digital (A/D) interfaces used in mixed-signal accelerators for deep neural networks (DNNs) in machine learning (ML) applications. These very dedicated systems exploit analog in-memory computation (AiMC) of weights and input activations to accelerate the DNN algorithm. The co-optimization of the memory cell storing the weights with the peripheral circuits is mandatory for improving the performance metrics of the accelerator. In this tutorial, four memory devices for AiMC are reported and analyzed with their computation scheme,
including the digital-to-analog converter (DAC). Moreover, we review analog-to-digital converters (ADCs) for the quantization of the AiMC results, focusing on the design trade-offs of the different topologies given by the context
Efficient Deep Neural Network Accelerator Using Controlled Ferroelectric Domain Dynamics
The current work reports an efficient deep neural network (DNN) accelerator
where synaptic weight elements are controlled by ferroelectric domain dynamics.
An integrated device-to-algorithm framework for benchmarking novel synaptic
devices is used. In P(VDF-TrFE) based ferroelectric tunnel junctions, analog
conductance states are measured using a custom pulsing protocol and associated
custom circuits and array architectures for DNN training is simulated. Our
results show precise control of polarization switching dynamics in
multi-domain, polycrystalline ferroelectric thin films can produce considerable
weight update linearity in metal-ferroelectric-semiconductor (MFS) tunnel
junctions. Ultrafast switching and low junction current in these devices offer
extremely energy efficient operation. Through an integrated platform of
hardware development, characterization and modelling, we predict the available
conductance range where linearity is expected under identical potentiating and
depressing pulses for efficient DNN training and inference tasks. As an
example, an analog crossbar based DNN accelerator with MFS junctions as
synaptic weight elements showed ~ 93% training accuracy on large MNIST
handwritten digit dataset while for cropped images, a 95% accuracy is achieved.
One observed challenge is rather limited dynamic conductance range while
operating under identical potentiating and depressing pulses below 1V.
Investigation is underway for improving the dynamic conductance range without
losing the weight update linearity
Neuromorphic Computing with Deeply Scaled Ferroelectric FinFET in Presence of Process Variation, Device Aging and Flicker Noise
This paper reports a comprehensive study on the applicability of ultra-scaled
ferroelectric FinFETs with 6 nm thick hafnium zirconium oxide layer for
neuromorphic computing in the presence of process variation, flicker noise, and
device aging. An intricate study has been conducted about the impact of such
variations on the inference accuracy of pre-trained neural networks consisting
of analog, quaternary (2-bit/cell) and binary synapse. A pre-trained neural
network with 97.5% inference accuracy on the MNIST dataset has been adopted as
the baseline. Process variation, flicker noise, and device aging
characterization have been performed and a statistical model has been developed
to capture all these effects during neural network simulation. Extrapolated
retention above 10 years have been achieved for binary read-out procedure. We
have demonstrated that the impact of (1) retention degradation due to the oxide
thickness scaling, (2) process variation, and (3) flicker noise can be abated
in ferroelectric FinFET based binary neural networks, which exhibits superior
performance over quaternary and analog neural network, amidst all variations.
The performance of a neural network is the result of coalesced performance of
device, architecture and algorithm. This research corroborates the
applicability of deeply scaled ferroelectric FinFETs for non-von Neumann
computing with proper combination of architecture and algorithm
Efficient hardware implementations of bio-inspired networks
The human brain, with its massive computational capability and power efficiency in small form factor, continues to inspire the ultimate goal of building machines that can perform tasks without being explicitly programmed. In an effort to mimic the natural information processing paradigms observed in the brain, several neural network generations have been proposed over the years. Among the neural networks inspired by biology, second-generation Artificial or Deep Neural Networks (ANNs/DNNs) use memoryless neuron models and have shown unprecedented success surpassing humans in a wide variety of tasks. Unlike ANNs, third-generation Spiking Neural Networks (SNNs) closely mimic biological neurons by operating on discrete and sparse events in time called spikes, which are obtained by the time integration of previous inputs.
Implementation of data-intensive neural network models on computers based on the von Neumann architecture is mainly limited by the continuous data transfer between the physically separated memory and processing units. Hence, non-von Neumann architectural solutions are essential for processing these memory-intensive bio-inspired neural networks in an energy-efficient manner. Among the non-von Neumann architectures, implementations employing non-volatile memory (NVM) devices are most promising due to their compact size and low operating power. However, it is non-trivial to integrate these nanoscale devices on conventional computational substrates due to their non-idealities, such as limited dynamic range, finite bit resolution, programming variability, etc. This dissertation demonstrates the architectural and algorithmic optimizations of implementing bio-inspired neural networks using emerging nanoscale devices.
The first half of the dissertation focuses on the hardware acceleration of DNN implementations. A 4-layer stochastic DNN in a crossbar architecture with memristive devices at the cross point is analyzed for accelerating DNN training. This network is then used as a baseline to explore the impact of experimental memristive device behavior on network performance. Programming variability is found to have a critical role in determining network performance compared to other non-ideal characteristics of the devices. In addition, noise-resilient inference engines are demonstrated using stochastic memristive DNNs with 100 bits for stochastic encoding during inference and 10 bits for the expensive training.
The second half of the dissertation focuses on a novel probabilistic framework for SNNs using the Generalized Linear Model (GLM) neurons for capturing neuronal behavior. This work demonstrates that probabilistic SNNs have comparable perform-ance against equivalent ANNs on two popular benchmarks - handwritten-digit classification and human activity recognition. Considering the potential of SNNs in energy-efficient implementations, a hardware accelerator for inference is proposed, termed as Spintronic Accelerator for Probabilistic SNNs (SpinAPS). The learning algorithm is optimized for a hardware friendly implementation and uses first-to-spike decoding scheme for low latency inference. With binary spintronic synapses and digital CMOS logic neurons for computations, SpinAPS achieves a performance improvement of 4x in terms of GSOPS/W/mm when compared to a conventional SRAM-based design.
Collectively, this work demonstrates the potential of emerging memory technologies in building energy-efficient hardware architectures for deep and spiking neural networks. The design strategies adopted in this work can be extended to other spike and non-spike based systems for building embedded solutions having power/energy constraints
Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference
Analog In-Memory Computing (AIMC) is a promising approach to reduce the
latency and energy consumption of Deep Neural Network (DNN) inference and
training. However, the noisy and non-linear device characteristics, and the
non-ideal peripheral circuitry in AIMC chips, require adapting DNNs to be
deployed on such hardware to achieve equivalent accuracy to digital computing.
In this tutorial, we provide a deep dive into how such adaptations can be
achieved and evaluated using the recently released IBM Analog Hardware
Acceleration Kit (AIHWKit), freely available at https://github.com/IBM/aihwkit.
The AIHWKit is a Python library that simulates inference and training of DNNs
using AIMC. We present an in-depth description of the AIHWKit design,
functionality, and best practices to properly perform inference and training.
We also present an overview of the Analog AI Cloud Composer, that provides the
benefits of using the AIHWKit simulation platform in a fully managed cloud
setting. Finally, we show examples on how users can expand and customize
AIHWKit for their own needs. This tutorial is accompanied by comprehensive
Jupyter Notebook code examples that can be run using AIHWKit, which can be
downloaded from https://github.com/IBM/aihwkit/tree/master/notebooks/tutorial
- …