153 research outputs found
Robust Networks: Neural Networks Robust to Quantization Noise and Analog Computation Noise Based on Natural Gradient
abstract: Deep neural networks (DNNs) have had tremendous success in a variety of
statistical learning applications due to their vast expressive power. Most
applications run DNNs on the cloud on parallelized architectures. There is a need
for for efficient DNN inference on edge with low precision hardware and analog
accelerators. To make trained models more robust for this setting, quantization and
analog compute noise are modeled as weight space perturbations to DNNs and an
information theoretic regularization scheme is used to penalize the KL-divergence
between perturbed and unperturbed models. This regularizer has similarities to
both natural gradient descent and knowledge distillation, but has the advantage of
explicitly promoting the network to and a broader minimum that is robust to
weight space perturbations. In addition to the proposed regularization,
KL-divergence is directly minimized using knowledge distillation. Initial validation
on FashionMNIST and CIFAR10 shows that the information theoretic regularizer
and knowledge distillation outperform existing quantization schemes based on the
straight through estimator or L2 constrained quantization.Dissertation/ThesisMasters Thesis Computer Engineering 201
EPIM: Efficient Processing-In-Memory Accelerators based on Epitome
The exploration of Processing-In-Memory (PIM) accelerators has garnered
significant attention within the research community. However, the utilization
of large-scale neural networks on Processing-In-Memory (PIM) accelerators
encounters challenges due to constrained on-chip memory capacity. To tackle
this issue, current works explore model compression algorithms to reduce the
size of Convolutional Neural Networks (CNNs). Most of these algorithms either
aim to represent neural operators with reduced-size parameters (e.g.,
quantization) or search for the best combinations of neural operators (e.g.,
neural architecture search). Designing neural operators to align with PIM
accelerators' specifications is an area that warrants further study. In this
paper, we introduce the Epitome, a lightweight neural operator offering
convolution-like functionality, to craft memory-efficient CNN operators for PIM
accelerators (EPIM). On the software side, we evaluate epitomes' latency and
energy on PIM accelerators and introduce a PIM-aware layer-wise design method
to enhance their hardware efficiency. We apply epitome-aware quantization to
further reduce the size of epitomes. On the hardware side, we modify the
datapath of current PIM accelerators to accommodate epitomes and implement a
feature map reuse technique to reduce computation cost. Experimental results
reveal that our 3-bit quantized EPIM-ResNet50 attains 71.59% top-1 accuracy on
ImageNet, reducing crossbar areas by 30.65 times. EPIM surpasses the
state-of-the-art pruning methods on PIM
Low-Power Computer Vision: Improve the Efficiency of Artificial Intelligence
Energy efficiency is critical for running computer vision on battery-powered systems, such as mobile phones or UAVs (unmanned aerial vehicles, or drones). This book collects the methods that have won the annual IEEE Low-Power Computer Vision Challenges since 2015. The winners share their solutions and provide insight on how to improve the efficiency of machine learning systems
Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing
Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial
and technological revolution of the present and future. We envision a world with smart
devices that are able to mimic human behavior (sense, process, and act) and perform
tasks that at one time we thought could only be carried out by humans. The vision
is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware
platforms. However, embedding machine learning algorithms in many application domains
such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing
challenge. A challenge that is controlled by the computational complexity of ML algorithms,
the performance/availability of hardware platforms, and the application\u2019s budget (power
constraint, real-time operation, etc.). In this dissertation, we focus on the design and
implementation of efficient ML algorithms to handle the aforementioned challenges. First, we
apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of
ML algorithms. Then, we design custom Hardware Accelerators to improve the performance
of the implementation within a specified budget. Finally, a tactile data processing application
is adopted for the validation of the proposed exact and approximate embedded machine
learning accelerators.
The dissertation starts with the introduction of the various ML algorithms used for
tactile data processing. These algorithms are assessed in terms of their computational
complexity and the available hardware platforms which could be used for implementation.
Afterward, a survey on the existing approximate computing techniques and hardware
accelerators design methodologies is presented. Based on the findings of the survey, an
approach for applying algorithmic-level ACTs on machine learning algorithms is provided.
Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based
on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow
Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN).
The three accelerators offer a real-time classification with monumental reductions in the
hardware resources and power consumption compared to existing implementations targeting
the same tactile data processing application on FPGA. Moreover, the approximate accelerators
maintain a high classification accuracy with a loss of at most 5%
- …