7 research outputs found
Memory-Immersed Collaborative Digitization for Area-Efficient Compute-in-Memory Deep Learning
This work discusses memory-immersed collaborative digitization among
compute-in-memory (CiM) arrays to minimize the area overheads of a conventional
analog-to-digital converter (ADC) for deep learning inference. Thereby, using
the proposed scheme, significantly more CiM arrays can be accommodated within
limited footprint designs to improve parallelism and minimize external memory
accesses. Under the digitization scheme, CiM arrays exploit their parasitic bit
lines to form a within-memory capacitive digital-to-analog converter (DAC) that
facilitates area-efficient successive approximation (SA) digitization. CiM
arrays collaborate where a proximal array digitizes the analog-domain
product-sums when an array computes the scalar product of input and weights. We
discuss various networking configurations among CiM arrays where Flash, SA, and
their hybrid digitization steps can be efficiently implemented using the
proposed memory-immersed scheme. The results are demonstrated using a 65 nm
CMOS test chip. Compared to a 40 nm-node 5-bit SAR ADC, our 65 nm design
requires 25 less area and 1.4 less energy by
leveraging in-memory computing structures. Compared to a 40 nm-node 5-bit Flash
ADC, our design requires 51 less area and 13 less
energy
Efficient, Mixed Precision In-Memory Deep learning at the Edge
Deep neural networks (DNNs) have shown remarkable prediction accuracy in many practical applications. DNNs in these applications typically utilize thousands to millions of parameters
(i.e., weights) and are trained over a huge number of example patterns. Operating over such a large parametric space, which is carefully orchestrated over multiple abstraction levels (i.e., hidden layers), facilitates DNNs with a superior generalization and learning capacity but also
presents critical inference constraints, especially when considering real-time and/or low-power applications. This thesis proposes approaches for low-energy implementation for DNN
accelerators for edge applications. We propose a co-design approach for compute-in-memory
inference for deep neural networks (DNN). We use multiplication-free function approximators
based on the l`1 norm along with a co-adapted processing array and compute flow. Using the
approach, we overcame many deficiencies in the current art of in-SRAM DNN processing, such
as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the
need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation
seamlessly extends to multi-bit precision weights, doesn’t require DACs, and easily extends to
higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation
ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of the SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive
DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. The thesis also explores automation algorithms for
searching energy-optimized neural architecture
Low Power Restricted Boltzmann Machine Using Mixed-Mode Magneto-Tunneling Junctions
This letter discusses mixed-mode magneto tunneling junction (m-MTJ)-based restricted Boltzmann machine (RBM). RBMs are unsupervised learning mod- els, suitable for extracting features from high-dimensional data. The m-MTJ is actuated by the simultaneous actions of voltage-controlled magnetic anisotropy and voltage- controlled spin-transfer torque, where the switching of the free-layer is probabilistic and can be controlled by the two. Using m-MTJ-based activation functions, we present a novel low area/power RBM. We discuss online learning of the presented implementation to negate process variability. For MNIST hand-written dataset, the design achieves ∼96% accuracy under expected variability in various components