35 research outputs found

    X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories

    Get PDF
    Silicon-based Static Random Access Memories (SRAM) and digital Boolean logic have been the workhorse of the state-of-art computing platforms. Despite tremendous strides in scaling the ubiquitous metal-oxide-semiconductor transistor, the underlying \textit{von-Neumann} computing architecture has remained unchanged. The limited throughput and energy-efficiency of the state-of-art computing systems, to a large extent, results from the well-known \textit{von-Neumann bottleneck}. The energy and throughput inefficiency of the von-Neumann machines have been accentuated in recent times due to the present emphasis on data-intensive applications like artificial intelligence, machine learning \textit{etc}. A possible approach towards mitigating the overhead associated with the von-Neumann bottleneck is to enable \textit{in-memory} Boolean computations. In this manuscript, we present an augmented version of the conventional SRAM bit-cells, called \textit{the X-SRAM}, with the ability to perform in-memory, vector Boolean computations, in addition to the usual memory storage operations. We propose at least six different schemes for enabling in-memory vector computations including NAND, NOR, IMP (implication), XOR logic gates with respect to different bit-cell topologies −- the 8T cell and the 8+^+T Differential cell. In addition, we also present a novel \textit{`read-compute-store'} scheme, wherein the computed Boolean function can be directly stored in the memory without the need of latching the data and carrying out a subsequent write operation. The feasibility of the proposed schemes has been verified using predictive transistor models and Monte-Carlo variation analysis.Comment: This article has been accepted in a future issue of IEEE Transactions on Circuits and Systems-I: Regular Paper

    Design Space Exploration and Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays: Device-Circuit Non-Idealities and System Accuracy

    Full text link
    In-memory computing (IMC) utilizing synaptic crossbar arrays is promising for deep neural networks to attain high energy efficiency and integration density. Towards that end, various CMOS and post-CMOS technologies have been explored as promising synaptic device candidates which include SRAM, ReRAM, FeFET, SOT-MRAM, etc. However, each of these technologies has its own pros and cons, which need to be comparatively evaluated in the context of synaptic array designs. For a fair comparison, such an analysis must carefully optimize each technology, specifically for synaptic crossbar design accounting for device and circuit non-idealities in crossbar arrays such as variations, wire resistance, driver/sink resistance, etc. In this work, we perform a comprehensive design space exploration and comparative evaluation of different technologies at 7nm technology node for synaptic crossbar arrays, in the context of IMC robustness and system accuracy. Firstly, we integrate different technologies into a cross-layer simulation flow based on physics-based models of synaptic devices and interconnects. Secondly, we optimize both technology-agnostic design knobs such as input encoding and ON-resistance as well as technology-specific design parameters including ferroelectric thickness in FeFET and MgO thickness in SOT-MRAM. Our optimization methodology accounts for the implications of device- and circuit-level non-idealities on the system-level accuracy for each technology. Finally, based on the optimized designs, we obtain inference results for ResNet-20 on CIFAR-10 dataset and show that FeFET-based crossbar arrays achieve the highest accuracy due to their compactness, low leakage and high ON/OFF current ratio

    A Survey on Layout Implementation and Analysis of Different SRAM Cell Topologies

    Get PDF
    Because powered widgets are frequently used, the primary goal of electronics is to design low-power devices. Because of its applications in low-energy computing, memory cell operation with low voltage consumption has become a major interest in memory cell design. Because of specification changes in scaled methodologies, the only critical method for the success of low-voltage SRAM design is the stable operation of SRAM. The traditional SRAM cell enables high-density and fast differential sensing but suffers from semi-selective and read-risk issues. The simulation results show that the proposed design provides the fastest read operation and overall power delay product optimization. Compared to the current topologies of 6T, 8T, and 10T, while a traditional SRAM cell solves the reading disruption problem, previous strategies for solving these problems have been ineffective due to low efficiency, data-dependent leakage, and high energy per connection. Our primary goal is to reduce power consumption, improve read performance, and reduce the area and power of the proposed design cell work. The proposed leakage reduction design circuit has been implemented on the micro-wind tool. Delay and power consumption are important factors in memory cell performance. The primary goal of this project is to create a low-power SRAM cell

    An ultra-low power in-memory computing cell for binarized neural networks

    Get PDF
    Deep Neural Networks (DNN’s) are widely used in many artificial intelligence applications such as image classification and image recognition. Data movement in DNN’s results in increased power consumption. The primary reason behind the energy-expensive data movement in DNN’s is due to the conventional Von Neuman architecture in which computing unit and memory are physically separated. To address the issue of energy-expensive data movement in DNN’s in-memory computing schemes are proposed in the literature. The fundamental principle behind in-memory computing is to enable the vector computations closer to the memory. In-memory computing schemes based on CMOS technologies are of great importance nowadays due to the ease of massive production and commercialization. However, many of the proposed in-memory computing schemes suffer from power and performance degradation. Besides, some of them are capable of reducing power consumption only to a small extent and this requires sacrificing the overall signal to noise ratio (SNR). This thesis discusses an efficient In-Memory Computing (IMC) cell for Binarized Neural Networks (BNNs). Moreover, IMC cell was modelled using the simplest current computing method. In this thesis, the developed IMC cell is a practical solution to the energy-expensive data movement within the BNNs. A 4-bit Digital to Analog Converter (DAC) is designed and simulated using 130nm CMOS process. Using the 4-bit DAC the functionality of IMC scheme for BNNs is demonstrated. The optimised 4-bit DAC shows that it is a powerful IMC method for BNNs. The results presented in this thesis show this approach of IMC is capable of accurately performing dot operation between the input activations and the weights. Furthermore, 4-bit DAC provides a 4-bit weight precision, which provides an effective means to improve the overall accuracy

    HyDe: A Hybrid PCM/FeFET/SRAM Device-search for Optimizing Area and Energy-efficiencies in Analog IMC Platforms

    Full text link
    Today, there are a plethora of In-Memory Computing (IMC) devices- SRAMs, PCMs & FeFETs, that emulate convolutions on crossbar-arrays with high throughput. Each IMC device offers its own pros & cons during inference of Deep Neural Networks (DNNs) on crossbars in terms of area overhead, programming energy and non-idealities. A design-space exploration is, therefore, imperative to derive a hybrid-device architecture optimized for accurate DNN inference under the impact of non-idealities from multiple devices, while maintaining competitive area & energy-efficiencies. We propose a two-phase search framework (HyDe) that exploits the best of all worlds offered by multiple devices to determine an optimal hybrid-device architecture for a given DNN topology. Our hybrid models achieve upto 2.30-2.74x higher TOPS/mm^2 at 22-26% higher energy-efficiencies than baseline homogeneous models for a VGG16 DNN topology. We further propose a feasible implementation of the HyDe-derived hybrid-device architectures in the 2.5D design space using chiplets to reduce design effort and cost in the hardware fabrication involving multiple technology processes.Comment: Accepted to IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS
    corecore