5 research outputs found
Neuromorphic Hebbian learning with magnetic tunnel junction synapses
Neuromorphic computing aims to mimic both the function and structure of
biological neural networks to provide artificial intelligence with extreme
efficiency. Conventional approaches store synaptic weights in non-volatile
memory devices with analog resistance states, permitting in-memory computation
of neural network operations while avoiding the costs associated with
transferring synaptic weights from a memory array. However, the use of analog
resistance states for storing weights in neuromorphic systems is impeded by
stochastic writing, weights drifting over time through stochastic processes,
and limited endurance that reduces the precision of synapse weights. Here we
propose and experimentally demonstrate neuromorphic networks that provide
high-accuracy inference thanks to the binary resistance states of magnetic
tunnel junctions (MTJs), while leveraging the analog nature of their stochastic
spin-transfer torque (STT) switching for unsupervised Hebbian learning. We
performed the first experimental demonstration of a neuromorphic network
directly implemented with MTJ synapses, for both inference and
spike-timing-dependent plasticity learning. We also demonstrated through
simulation that the proposed system for unsupervised Hebbian learning with
stochastic STT-MTJ synapses can achieve competitive accuracies for MNIST
handwritten digit recognition. By appropriately applying neuromorphic
principles through hardware-aware design, the proposed STT-MTJ neuromorphic
learning networks provide a pathway toward artificial intelligence hardware
that learns autonomously with extreme efficiency
Architecture-accuracy co-optimization of reram-based low-cost neural network processor
Department of Electrical EngineeringResistive RAM (ReRAM) is a promising technology with such advantages as small device size and in-memory-computing capability. However, designing optimal AI processors based on ReRAMs is challenging due to the limited precision, and the complex interplay between quality of result and hardware efficiency. In this paper we present a study targeting a low-power low-cost image classification application. We discover that the trade-off between accuracy and hardware efficiency in ReRAM-based hardware is not obvious and even surprising, and our solution developed for a recently fabricated ReRAM device achieves both the state-of-the-art efficiency and empirical assurance on the high quality of result.clos
A Construction Kit for Efficient Low Power Neural Network Accelerator Designs
Implementing embedded neural network processing at the edge requires
efficient hardware acceleration that couples high computational performance
with low power consumption. Driven by the rapid evolution of network
architectures and their algorithmic features, accelerator designs are
constantly updated and improved. To evaluate and compare hardware design
choices, designers can refer to a myriad of accelerator implementations in the
literature. Surveys provide an overview of these works but are often limited to
system-level and benchmark-specific performance metrics, making it difficult to
quantitatively compare the individual effect of each utilized optimization
technique. This complicates the evaluation of optimizations for new accelerator
designs, slowing-down the research progress. This work provides a survey of
neural network accelerator optimization approaches that have been used in
recent works and reports their individual effects on edge processing
performance. It presents the list of optimizations and their quantitative
effects as a construction kit, allowing to assess the design choices for each
building block separately. Reported optimizations range from up to 10'000x
memory savings to 33x energy reductions, providing chip designers an overview
of design choices for implementing efficient low power neural network
accelerators
Architecture and Circuit Design Optimization for Compute-In-Memory
The objective of the proposed research is to optimize computing-in-memory (CIM) design for accelerating Deep Neural Network (DNN) algorithms. As compute peripheries such as analog-to-digital converter (ADC) introduce significant overhead in CIM inference design, the research first focuses on the circuit optimization for inference acceleration and proposes a resistive random access memory (RRAM) based ADC-free in-memory compute scheme. We comprehensively explore the trade-offs involving different types of ADCs and investigate a new ADC design especially suited for the CIM, which performs the analog shift-add for multiple weight significance bits, improving the throughput and energy efficiency under similar area constraints. Furthermore, we prototype an ADC-free CIM inference chip design with a fully-analog data processing manner between sub-arrays, which can significantly improve the hardware performance over the conventional CIM designs and achieve near-software classification accuracy on ImageNet and CIFAR-10/-100 dataset. Secondly, the research focuses on hardware support for CIM on-chip training. To maximize hardware reuse of CIM weight stationary dataflow, we propose the CIM training architectures with the transpose weight mapping strategy. The cell design and periphery circuitry are modified to efficiently support bi-directional compute. A novel solution of signed number multiplication is also proposed to handle the negative input in backpropagation. Finally, we propose an SRAM-based CIM training architecture and comprehensively explore the system-level hardware performance for DNN on-chip training based on silicon measurement results.Ph.D