15 research outputs found
XploreNAS: Explore Adversarially Robust & Hardware-efficient Neural Architectures for Non-ideal Xbars
Compute In-Memory platforms such as memristive crossbars are gaining focus as
they facilitate acceleration of Deep Neural Networks (DNNs) with high area and
compute-efficiencies. However, the intrinsic non-idealities associated with the
analog nature of computing in crossbars limits the performance of the deployed
DNNs. Furthermore, DNNs are shown to be vulnerable to adversarial attacks
leading to severe security threats in their large-scale deployment. Thus,
finding adversarially robust DNN architectures for non-ideal crossbars is
critical to the safe and secure deployment of DNNs on the edge. This work
proposes a two-phase algorithm-hardware co-optimization approach called
XploreNAS that searches for hardware-efficient & adversarially robust neural
architectures for non-ideal crossbar platforms. We use the one-shot Neural
Architecture Search (NAS) approach to train a large Supernet with
crossbar-awareness and sample adversarially robust Subnets therefrom,
maintaining competitive hardware-efficiency. Our experiments on crossbars with
benchmark datasets (SVHN, CIFAR10 & CIFAR100) show upto ~8-16% improvement in
the adversarial robustness of the searched Subnets against a baseline ResNet-18
model subjected to crossbar-aware adversarial training. We benchmark our robust
Subnets for Energy-Delay-Area-Products (EDAPs) using the Neurosim tool and find
that with additional hardware-efficiency driven optimizations, the Subnets
attain ~1.5-1.6x lower EDAPs than ResNet-18 baseline.Comment: 16 pages, 8 figures, 2 table
HyDe: A Hybrid PCM/FeFET/SRAM Device-search for Optimizing Area and Energy-efficiencies in Analog IMC Platforms
Today, there are a plethora of In-Memory Computing (IMC) devices- SRAMs, PCMs
& FeFETs, that emulate convolutions on crossbar-arrays with high throughput.
Each IMC device offers its own pros & cons during inference of Deep Neural
Networks (DNNs) on crossbars in terms of area overhead, programming energy and
non-idealities. A design-space exploration is, therefore, imperative to derive
a hybrid-device architecture optimized for accurate DNN inference under the
impact of non-idealities from multiple devices, while maintaining competitive
area & energy-efficiencies. We propose a two-phase search framework (HyDe) that
exploits the best of all worlds offered by multiple devices to determine an
optimal hybrid-device architecture for a given DNN topology. Our hybrid models
achieve upto 2.30-2.74x higher TOPS/mm^2 at 22-26% higher energy-efficiencies
than baseline homogeneous models for a VGG16 DNN topology. We further propose a
feasible implementation of the HyDe-derived hybrid-device architectures in the
2.5D design space using chiplets to reduce design effort and cost in the
hardware fabrication involving multiple technology processes.Comment: Accepted to IEEE Journal on Emerging and Selected Topics in Circuits
and Systems (JETCAS
XPert: Peripheral Circuit & Neural Architecture Co-search for Area and Energy-efficient Xbar-based Computing
The hardware-efficiency and accuracy of Deep Neural Networks (DNNs)
implemented on In-memory Computing (IMC) architectures primarily depend on the
DNN architecture and the peripheral circuit parameters. It is therefore
essential to holistically co-search the network and peripheral parameters to
achieve optimal performance. To this end, we propose XPert, which co-searches
network architecture in tandem with peripheral parameters such as the type and
precision of analog-to-digital converters, crossbar column sharing and the
layer-specific input precision using an optimization-based design space
exploration. Compared to VGG16 baselines, XPert achieves 10.24x (4.7x) lower
EDAP, 1.72x (1.62x) higher TOPS/W,1.93x (3x) higher TOPS/mm2 at 92.46% (56.7%)
accuracy for CIFAR10 (TinyImagenet) datasets. The code for this paper is
available at https://github.com/Intelligent-Computing-Lab-Yale/XPert.Comment: Accepted to Design and Automation Conference (DAC
MINT: Multiplier-less Integer Quantization for Spiking Neural Networks
We propose Multiplier-less INTeger (MINT) quantization, an efficient uniform
quantization scheme for the weights and membrane potentials in spiking neural
networks (SNNs). Unlike prior SNN quantization works, MINT quantizes the
memory-hungry membrane potentials to extremely low precision (2-bit) to
significantly reduce the total memory footprint. Additionally, MINT
quantization shares the quantization scaling factor between the weights and
membrane potentials, eliminating the need for multipliers that are necessary
for vanilla uniform quantization. Experimental results demonstrate that our
proposed method achieves accuracy that matches the full-precision models and
other state-of-the-art SNN quantization works while outperforming them on total
memory footprint and hardware cost at deployment. For instance, 2-bit MINT
VGG-16 achieves 90.6% accuracy on CIFAR-10 with approximately 93.8% reduction
in total memory footprint from the full-precision model; meanwhile, it reduces
90% computation energy compared to the vanilla uniform quantization at
deployment.Comment: 6 pages. Accepted to 29th Asia and South Pacific Design Automation
Conference (ASP-DAC 2024
Do We Really Need a Large Number of Visual Prompts?
Due to increasing interest in adapting models on resource-constrained edges,
parameter-efficient transfer learning has been widely explored. Among various
methods, Visual Prompt Tuning (VPT), prepending learnable prompts to input
space, shows competitive fine-tuning performance compared to training of full
network parameters. However, VPT increases the number of input tokens,
resulting in additional computational overhead. In this paper, we analyze the
impact of the number of prompts on fine-tuning performance and self-attention
operation in a vision transformer architecture. Through theoretical and
empirical analysis we show that adding more prompts does not lead to linear
performance improvement. Further, we propose a Prompt Condensation (PC)
technique that aims to prevent performance degradation from using a small
number of prompts. We validate our methods on FGVC and VTAB-1k tasks and show
that our approach reduces the number of prompts by ~70% while maintaining
accuracy
Input-Aware Dynamic Timestep Spiking Neural Networks for Efficient In-Memory Computing
Spiking Neural Networks (SNNs) have recently attracted widespread research
interest as an efficient alternative to traditional Artificial Neural Networks
(ANNs) because of their capability to process sparse and binary spike
information and avoid expensive multiplication operations. Although the
efficiency of SNNs can be realized on the In-Memory Computing (IMC)
architecture, we show that the energy cost and latency of SNNs scale linearly
with the number of timesteps used on IMC hardware. Therefore, in order to
maximize the efficiency of SNNs, we propose input-aware Dynamic Timestep SNN
(DT-SNN), a novel algorithmic solution to dynamically determine the number of
timesteps during inference on an input-dependent basis. By calculating the
entropy of the accumulated output after each timestep, we can compare it to a
predefined threshold and decide if the information processed at the current
timestep is sufficient for a confident prediction. We deploy DT-SNN on an IMC
architecture and show that it incurs negligible computational overhead. We
demonstrate that our method only uses 1.46 average timesteps to achieve the
accuracy of a 4-timestep static SNN while reducing the energy-delay-product by
80%.Comment: Published at Design & Automation Conferences (DAC) 202
Sharing Leaky-Integrate-and-Fire Neurons for Memory-Efficient Spiking Neural Networks
Spiking Neural Networks (SNNs) have gained increasing attention as
energy-efficient neural networks owing to their binary and asynchronous
computation. However, their non-linear activation, that is
Leaky-Integrate-and-Fire (LIF) neuron, requires additional memory to store a
membrane voltage to capture the temporal dynamics of spikes. Although the
required memory cost for LIF neurons significantly increases as the input
dimension goes larger, a technique to reduce memory for LIF neurons has not
been explored so far. To address this, we propose a simple and effective
solution, EfficientLIF-Net, which shares the LIF neurons across different
layers and channels. Our EfficientLIF-Net achieves comparable accuracy with the
standard SNNs while bringing up to ~4.3X forward memory efficiency and ~21.9X
backward memory efficiency for LIF neurons. We conduct experiments on various
datasets including CIFAR10, CIFAR100, TinyImageNet, ImageNet-100, and
N-Caltech101. Furthermore, we show that our approach also offers advantages on
Human Activity Recognition (HAR) datasets, which heavily rely on temporal
information