35 research outputs found
X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories
Silicon-based Static Random Access Memories (SRAM) and digital Boolean logic
have been the workhorse of the state-of-art computing platforms. Despite
tremendous strides in scaling the ubiquitous metal-oxide-semiconductor
transistor, the underlying \textit{von-Neumann} computing architecture has
remained unchanged. The limited throughput and energy-efficiency of the
state-of-art computing systems, to a large extent, results from the well-known
\textit{von-Neumann bottleneck}. The energy and throughput inefficiency of the
von-Neumann machines have been accentuated in recent times due to the present
emphasis on data-intensive applications like artificial intelligence, machine
learning \textit{etc}. A possible approach towards mitigating the overhead
associated with the von-Neumann bottleneck is to enable \textit{in-memory}
Boolean computations. In this manuscript, we present an augmented version of
the conventional SRAM bit-cells, called \textit{the X-SRAM}, with the ability
to perform in-memory, vector Boolean computations, in addition to the usual
memory storage operations. We propose at least six different schemes for
enabling in-memory vector computations including NAND, NOR, IMP (implication),
XOR logic gates with respect to different bit-cell topologies the 8T cell
and the 8T Differential cell. In addition, we also present a novel
\textit{`read-compute-store'} scheme, wherein the computed Boolean function can
be directly stored in the memory without the need of latching the data and
carrying out a subsequent write operation. The feasibility of the proposed
schemes has been verified using predictive transistor models and Monte-Carlo
variation analysis.Comment: This article has been accepted in a future issue of IEEE Transactions
on Circuits and Systems-I: Regular Paper
Design Space Exploration and Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays: Device-Circuit Non-Idealities and System Accuracy
In-memory computing (IMC) utilizing synaptic crossbar arrays is promising for
deep neural networks to attain high energy efficiency and integration density.
Towards that end, various CMOS and post-CMOS technologies have been explored as
promising synaptic device candidates which include SRAM, ReRAM, FeFET,
SOT-MRAM, etc. However, each of these technologies has its own pros and cons,
which need to be comparatively evaluated in the context of synaptic array
designs. For a fair comparison, such an analysis must carefully optimize each
technology, specifically for synaptic crossbar design accounting for device and
circuit non-idealities in crossbar arrays such as variations, wire resistance,
driver/sink resistance, etc. In this work, we perform a comprehensive design
space exploration and comparative evaluation of different technologies at 7nm
technology node for synaptic crossbar arrays, in the context of IMC robustness
and system accuracy. Firstly, we integrate different technologies into a
cross-layer simulation flow based on physics-based models of synaptic devices
and interconnects. Secondly, we optimize both technology-agnostic design knobs
such as input encoding and ON-resistance as well as technology-specific design
parameters including ferroelectric thickness in FeFET and MgO thickness in
SOT-MRAM. Our optimization methodology accounts for the implications of device-
and circuit-level non-idealities on the system-level accuracy for each
technology. Finally, based on the optimized designs, we obtain inference
results for ResNet-20 on CIFAR-10 dataset and show that FeFET-based crossbar
arrays achieve the highest accuracy due to their compactness, low leakage and
high ON/OFF current ratio
A Survey on Layout Implementation and Analysis of Different SRAM Cell Topologies
Because powered widgets are frequently used, the primary goal of electronics is to design low-power devices. Because of its applications in low-energy computing, memory cell operation with low voltage consumption has become a major interest in memory cell design. Because of specification changes in scaled methodologies, the only critical method for the success of low-voltage SRAM design is the stable operation of SRAM. The traditional SRAM cell enables high-density and fast differential sensing but suffers from semi-selective and read-risk issues. The simulation results show that the proposed design provides the fastest read operation and overall power delay product optimization. Compared to the current topologies of 6T, 8T, and 10T, while a traditional SRAM cell solves the reading disruption problem, previous strategies for solving these problems have been ineffective due to low efficiency, data-dependent leakage, and high energy per connection. Our primary goal is to reduce power consumption, improve read performance, and reduce the area and power of the proposed design cell work. The proposed leakage reduction design circuit has been implemented on the micro-wind tool. Delay and power consumption are important factors in memory cell performance. The primary goal of this project is to create a low-power SRAM cell
An ultra-low power in-memory computing cell for binarized neural networks
Deep Neural Networks (DNNâs) are widely used in many artificial intelligence applications such as image classification and image recognition. Data movement in DNNâs results in increased power consumption. The primary reason behind the energy-expensive data movement in DNNâs is due to the conventional Von Neuman architecture in which computing unit and memory are physically separated. To address the issue of energy-expensive data movement in DNNâs in-memory computing schemes are proposed in the literature. The fundamental principle behind in-memory computing is to enable the vector computations closer to the memory. In-memory computing schemes based on CMOS technologies are of great importance nowadays due to the ease of massive production and commercialization. However, many of the proposed in-memory computing schemes suffer from power and performance degradation. Besides, some of them are capable of reducing power consumption only to a small extent and this requires sacrificing the overall signal to noise ratio (SNR). This thesis discusses an efficient In-Memory Computing (IMC) cell for Binarized Neural Networks (BNNs). Moreover, IMC cell was modelled using the simplest current computing method. In this thesis, the developed IMC cell is a practical solution to the energy-expensive data movement within the BNNs. A 4-bit Digital to Analog Converter (DAC) is designed and simulated using 130nm CMOS process. Using the 4-bit DAC the functionality of IMC scheme for BNNs is demonstrated. The optimised 4-bit DAC shows that it is a powerful IMC method for BNNs. The results presented in this thesis show this approach of IMC is capable of accurately performing dot operation between the input activations and the weights. Furthermore, 4-bit DAC provides a 4-bit weight precision, which provides an effective means to improve the overall accuracy
HyDe: A Hybrid PCM/FeFET/SRAM Device-search for Optimizing Area and Energy-efficiencies in Analog IMC Platforms
Today, there are a plethora of In-Memory Computing (IMC) devices- SRAMs, PCMs
& FeFETs, that emulate convolutions on crossbar-arrays with high throughput.
Each IMC device offers its own pros & cons during inference of Deep Neural
Networks (DNNs) on crossbars in terms of area overhead, programming energy and
non-idealities. A design-space exploration is, therefore, imperative to derive
a hybrid-device architecture optimized for accurate DNN inference under the
impact of non-idealities from multiple devices, while maintaining competitive
area & energy-efficiencies. We propose a two-phase search framework (HyDe) that
exploits the best of all worlds offered by multiple devices to determine an
optimal hybrid-device architecture for a given DNN topology. Our hybrid models
achieve upto 2.30-2.74x higher TOPS/mm^2 at 22-26% higher energy-efficiencies
than baseline homogeneous models for a VGG16 DNN topology. We further propose a
feasible implementation of the HyDe-derived hybrid-device architectures in the
2.5D design space using chiplets to reduce design effort and cost in the
hardware fabrication involving multiple technology processes.Comment: Accepted to IEEE Journal on Emerging and Selected Topics in Circuits
and Systems (JETCAS