7 research outputs found

    Memory-Immersed Collaborative Digitization for Area-Efficient Compute-in-Memory Deep Learning

    Full text link
    This work discusses memory-immersed collaborative digitization among compute-in-memory (CiM) arrays to minimize the area overheads of a conventional analog-to-digital converter (ADC) for deep learning inference. Thereby, using the proposed scheme, significantly more CiM arrays can be accommodated within limited footprint designs to improve parallelism and minimize external memory accesses. Under the digitization scheme, CiM arrays exploit their parasitic bit lines to form a within-memory capacitive digital-to-analog converter (DAC) that facilitates area-efficient successive approximation (SA) digitization. CiM arrays collaborate where a proximal array digitizes the analog-domain product-sums when an array computes the scalar product of input and weights. We discuss various networking configurations among CiM arrays where Flash, SA, and their hybrid digitization steps can be efficiently implemented using the proposed memory-immersed scheme. The results are demonstrated using a 65 nm CMOS test chip. Compared to a 40 nm-node 5-bit SAR ADC, our 65 nm design requires ∼\sim25×\times less area and ∼\sim1.4×\times less energy by leveraging in-memory computing structures. Compared to a 40 nm-node 5-bit Flash ADC, our design requires ∼\sim51×\times less area and ∼\sim13×\times less energy

    Efficient, Mixed Precision In-Memory Deep learning at the Edge

    No full text
    Deep neural networks (DNNs) have shown remarkable prediction accuracy in many practical applications. DNNs in these applications typically utilize thousands to millions of parameters (i.e., weights) and are trained over a huge number of example patterns. Operating over such a large parametric space, which is carefully orchestrated over multiple abstraction levels (i.e., hidden layers), facilitates DNNs with a superior generalization and learning capacity but also presents critical inference constraints, especially when considering real-time and/or low-power applications. This thesis proposes approaches for low-energy implementation for DNN accelerators for edge applications. We propose a co-design approach for compute-in-memory inference for deep neural networks (DNN). We use multiplication-free function approximators based on the l`1 norm along with a co-adapted processing array and compute flow. Using the approach, we overcame many deficiencies in the current art of in-SRAM DNN processing, such as the need for digital-to-analog converters (DACs) at each operating SRAM row/column, the need for high precision analog-to-digital converters (ADCs), limited support for multi-bit precision weights, and limited vector-scale parallelism. Our co-adapted implementation seamlessly extends to multi-bit precision weights, doesn’t require DACs, and easily extends to higher vector-scale parallelism. We also propose an SRAM-immersed successive approximation ADC (SA-ADC), where we exploit the parasitic capacitance of bit lines of the SRAM array as a capacitive DAC. Since the dominant area overhead in SA-ADC comes due to its capacitive DAC, by exploiting the intrinsic parasitic of SRAM array, our approach allows low area implementation of within-SRAM SA-ADC. The thesis also explores automation algorithms for searching energy-optimized neural architecture

    Low Power Restricted Boltzmann Machine Using Mixed-Mode Magneto-Tunneling Junctions

    No full text
    This letter discusses mixed-mode magneto tunneling junction (m-MTJ)-based restricted Boltzmann machine (RBM). RBMs are unsupervised learning mod- els, suitable for extracting features from high-dimensional data. The m-MTJ is actuated by the simultaneous actions of voltage-controlled magnetic anisotropy and voltage- controlled spin-transfer torque, where the switching of the free-layer is probabilistic and can be controlled by the two. Using m-MTJ-based activation functions, we present a novel low area/power RBM. We discuss online learning of the presented implementation to negate process variability. For MNIST hand-written dataset, the design achieves ∼96% accuracy under expected variability in various components
    corecore