3,066 research outputs found
Robust Networks: Neural Networks Robust to Quantization Noise and Analog Computation Noise Based on Natural Gradient
abstract: Deep neural networks (DNNs) have had tremendous success in a variety of
statistical learning applications due to their vast expressive power. Most
applications run DNNs on the cloud on parallelized architectures. There is a need
for for efficient DNN inference on edge with low precision hardware and analog
accelerators. To make trained models more robust for this setting, quantization and
analog compute noise are modeled as weight space perturbations to DNNs and an
information theoretic regularization scheme is used to penalize the KL-divergence
between perturbed and unperturbed models. This regularizer has similarities to
both natural gradient descent and knowledge distillation, but has the advantage of
explicitly promoting the network to and a broader minimum that is robust to
weight space perturbations. In addition to the proposed regularization,
KL-divergence is directly minimized using knowledge distillation. Initial validation
on FashionMNIST and CIFAR10 shows that the information theoretic regularizer
and knowledge distillation outperform existing quantization schemes based on the
straight through estimator or L2 constrained quantization.Dissertation/ThesisMasters Thesis Computer Engineering 201
ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer
Stochastic computing (SC) has emerged as a promising computing paradigm for
neural acceleration. However, how to accelerate the state-of-the-art Vision
Transformer (ViT) with SC remains unclear. Unlike convolutional neural
networks, ViTs introduce notable compatibility and efficiency challenges
because of their nonlinear functions, e.g., softmax and Gaussian Error Linear
Units (GELU). In this paper, for the first time, a ViT accelerator based on
end-to-end SC, dubbed ASCEND, is proposed. ASCEND co-designs the SC circuits
and ViT networks to enable accurate yet efficient acceleration. To overcome the
compatibility challenges, ASCEND proposes a novel deterministic SC block for
GELU and leverages an SC-friendly iterative approximate algorithm to design an
accurate and efficient softmax circuit. To improve inference efficiency, ASCEND
develops a two-stage training pipeline to produce accurate low-precision ViTs.
With extensive experiments, we show the proposed GELU and softmax blocks
achieve 56.3% and 22.6% error reduction compared to existing SC designs,
respectively and reduce the area-delay product (ADP) by 5.29x and 12.6x,
respectively. Moreover, compared to the baseline low-precision ViTs, ASCEND
also achieves significant accuracy improvements on CIFAR10 and CIFAR100.Comment: Accepted in DATE 202
Conversion of Artificial Recurrent Neural Networks to Spiking Neural Networks for Low-power Neuromorphic Hardware
In recent years the field of neuromorphic low-power systems that consume
orders of magnitude less power gained significant momentum. However, their
wider use is still hindered by the lack of algorithms that can harness the
strengths of such architectures. While neuromorphic adaptations of
representation learning algorithms are now emerging, efficient processing of
temporal sequences or variable length-inputs remain difficult. Recurrent neural
networks (RNN) are widely used in machine learning to solve a variety of
sequence learning tasks. In this work we present a train-and-constrain
methodology that enables the mapping of machine learned (Elman) RNNs on a
substrate of spiking neurons, while being compatible with the capabilities of
current and near-future neuromorphic systems. This "train-and-constrain" method
consists of first training RNNs using backpropagation through time, then
discretizing the weights and finally converting them to spiking RNNs by
matching the responses of artificial neurons with those of the spiking neurons.
We demonstrate our approach by mapping a natural language processing task
(question classification), where we demonstrate the entire mapping process of
the recurrent layer of the network on IBM's Neurosynaptic System "TrueNorth", a
spike-based digital neuromorphic hardware architecture. TrueNorth imposes
specific constraints on connectivity, neural and synaptic parameters. To
satisfy these constraints, it was necessary to discretize the synaptic weights
and neural activities to 16 levels, and to limit fan-in to 64 inputs. We find
that short synaptic delays are sufficient to implement the dynamical (temporal)
aspect of the RNN in the question classification task. The hardware-constrained
model achieved 74% accuracy in question classification while using less than
0.025% of the cores on one TrueNorth chip, resulting in an estimated power
consumption of ~17 uW
- …