947 research outputs found
Matrix and tensor decompositions for training binary neural networks
This paper is on improving the training of binary neural networks in which
both activations and weights are binary. While prior methods for neural network
binarization binarize each filter independently, we propose to instead
parametrize the weight tensor of each layer using matrix or tensor
decomposition. The binarization process is then performed using this latent
parametrization, via a quantization function (e.g. sign function) applied to
the reconstructed weights. A key feature of our method is that while the
reconstruction is binarized, the computation in the latent factorized space is
done in the real domain. This has several advantages: (i) the latent
factorization enforces a coupling of the filters before binarization, which
significantly improves the accuracy of the trained models. (ii) while at
training time, the binary weights of each convolutional layer are parametrized
using real-valued matrix or tensor decomposition, during inference we simply
use the reconstructed (binary) weights. As a result, our method does not
sacrifice any advantage of binary networks in terms of model compression and
speeding-up inference. As a further contribution, instead of computing the
binary weight scaling factors analytically, as in prior work, we propose to
learn them discriminatively via back-propagation. Finally, we show that our
approach significantly outperforms existing methods when tested on the
challenging tasks of (a) human pose estimation (more than 4% improvements) and
(b) ImageNet classification (up to 5% performance gains)
Improved training of binary networks for human pose estimation and image recognition
Big neural networks trained on large datasets have advanced the
state-of-the-art for a large variety of challenging problems, improving
performance by a large margin. However, under low memory and limited
computational power constraints, the accuracy on the same problems drops
considerable. In this paper, we propose a series of techniques that
significantly improve the accuracy of binarized neural networks (i.e networks
where both the features and the weights are binary). We evaluate the proposed
improvements on two diverse tasks: fine-grained recognition (human pose
estimation) and large-scale image recognition (ImageNet classification).
Specifically, we introduce a series of novel methodological changes including:
(a) more appropriate activation functions, (b) reverse-order initialization,
(c) progressive quantization, and (d) network stacking and show that these
additions improve existing state-of-the-art network binarization techniques,
significantly. Additionally, for the first time, we also investigate the extent
to which network binarization and knowledge distillation can be combined. When
tested on the challenging MPII dataset, our method shows a performance
improvement of more than 4% in absolute terms. Finally, we further validate our
findings by applying the proposed techniques for large-scale object recognition
on the Imagenet dataset, on which we report a reduction of error rate by 4%
NullaNet: Training Deep Neural Networks for Reduced-Memory-Access Inference
Deep neural networks have been successfully deployed in a wide variety of
applications including computer vision and speech recognition. However,
computational and storage complexity of these models has forced the majority of
computations to be performed on high-end computing platforms or on the cloud.
To cope with computational and storage complexity of these models, this paper
presents a training method that enables a radically different approach for
realization of deep neural networks through Boolean logic minimization. The
aforementioned realization completely removes the energy-hungry step of
accessing memory for obtaining model parameters, consumes about two orders of
magnitude fewer computing resources compared to realizations that use
floatingpoint operations, and has a substantially lower latency
Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM
Weight pruning and weight quantization are two important categories of DNN
model compression. Prior work on these techniques are mainly based on
heuristics. A recent work developed a systematic frame-work of DNN weight
pruning using the advanced optimization technique ADMM (Alternating Direction
Methods of Multipliers), achieving one of state-of-art in weight pruning
results. In this work, we first extend such one-shot ADMM-based framework to
guarantee solution feasibility and provide fast convergence rate, and
generalize to weight quantization as well. We have further developed a
multi-step, progressive DNN weight pruning and quantization framework, with
dual benefits of (i) achieving further weight pruning/quantization thanks to
the special property of ADMM regularization, and (ii) reducing the search space
within each step. Extensive experimental results demonstrate the superior
performance compared with prior work. Some highlights: (i) we achieve 246x,36x,
and 8x weight pruning on LeNet-5, AlexNet, and ResNet-50 models, respectively,
with (almost) zero accuracy loss; (ii) even a significant 61x weight pruning in
AlexNet (ImageNet) results in only minor degradation in actual accuracy
compared with prior work; (iii) we are among the first to derive notable weight
pruning results for ResNet and MobileNet models; (iv) we derive the first
lossless, fully binarized (for all layers) LeNet-5 for MNIST and VGG-16 for
CIFAR-10; and (v) we derive the first fully binarized (for all layers) ResNet
for ImageNet with reasonable accuracy loss
PIMBALL: Binary Neural Networks in Spintronic Memory
Neural networks span a wide range of applications of industrial and
commercial significance. Binary neural networks (BNN) are particularly
effective in trading accuracy for performance, energy efficiency or
hardware/software complexity. Here, we introduce a spintronic, re-configurable
in-memory BNN accelerator, PIMBALL: Processing In Memory BNN AcceL(L)erator,
which allows for massively parallel and energy efficient computation. PIMBALL
is capable of being used as a standard spintronic memory (STT-MRAM) array and a
computational substrate simultaneously. We evaluate PIMBALL using multiple
image classifiers and a genomics kernel. Our simulation results show that
PIMBALL is more energy efficient than alternative CPU, GPU, and FPGA based
implementations while delivering higher throughput
Bitwise Neural Networks
Based on the assumption that there exists a neural network that efficiently
represents a set of Boolean functions between all binary inputs and outputs, we
propose a process for developing and deploying neural networks whose weight
parameters, bias terms, input, and intermediate hidden layer output signals,
are all binary-valued, and require only basic bit logic for the feedforward
pass. The proposed Bitwise Neural Network (BNN) is especially suitable for
resource-constrained environments, since it replaces either floating or
fixed-point arithmetic with significantly more efficient bitwise operations.
Hence, the BNN requires for less spatial complexity, less memory bandwidth, and
less power consumption in hardware. In order to design such networks, we
propose to add a few training schemes, such as weight compression and noisy
backpropagation, which result in a bitwise network that performs almost as well
as its corresponding real-valued network. We test the proposed network on the
MNIST dataset, represented using binary features, and show that BNNs result in
competitive performance while offering dramatic computational savings.Comment: This paper was presented at the International Conference on Machine
Learning (ICML) Workshop on Resource-Efficient Machine Learning, Lille,
France, Jul. 6-11, 201
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Neural architecture search (NAS) has a great impact by automatically
designing effective neural network architectures. However, the prohibitive
computational demand of conventional NAS algorithms (e.g. GPU hours)
makes it difficult to \emph{directly} search the architectures on large-scale
tasks (e.g. ImageNet). Differentiable NAS can reduce the cost of GPU hours via
a continuous representation of network architecture but suffers from the high
GPU memory consumption issue (grow linearly w.r.t. candidate set size). As a
result, they need to utilize~\emph{proxy} tasks, such as training on a smaller
dataset, or learning with only a few blocks, or training just for a few epochs.
These architectures optimized on proxy tasks are not guaranteed to be optimal
on the target task. In this paper, we present \emph{ProxylessNAS} that can
\emph{directly} learn the architectures for large-scale target tasks and target
hardware platforms. We address the high memory consumption issue of
differentiable NAS and reduce the computational cost (GPU hours and GPU memory)
to the same level of regular training while still allowing a large candidate
set. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of
directness and specialization. On CIFAR-10, our model achieves 2.08\% test
error with only 5.7M parameters, better than the previous state-of-the-art
architecture AmoebaNet-B, while using 6 fewer parameters. On ImageNet,
our model achieves 3.1\% better top-1 accuracy than MobileNetV2, while being
1.2 faster with measured GPU latency. We also apply ProxylessNAS to
specialize neural architectures for hardware with direct hardware metrics (e.g.
latency) and provide insights for efficient CNN architecture design.Comment: ICLR 201
Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers
This paper presents a novel end-to-end methodology for enabling the
deployment of low-error deep networks on microcontrollers. To fit the memory
and computational limitations of resource-constrained edge-devices, we exploit
mixed low-bitwidth compression, featuring 8, 4 or 2-bit uniform quantization,
and we model the inference graph with integer-only operations. Our approach
aims at determining the minimum bit precision of every activation and weight
tensor given the memory constraints of a device. This is achieved through a
rule-based iterative procedure, which cuts the number of bits of the most
memory-demanding layers, aiming at meeting the memory constraints. After a
quantization-aware retraining step, the fake-quantized graph is converted into
an inference integer-only model by inserting the Integer Channel-Normalization
(ICN) layers, which introduce a negligible loss as demonstrated on INT4
MobilenetV1 models. We report the latency-accuracy evaluation of
mixed-precision MobilenetV1 family networks on a STM32H7 microcontroller. Our
experimental results demonstrate an end-to-end deployment of an integer-only
Mobilenet network with Top1 accuracy of 68% on a device with only 2MB of FLASH
memory and 512kB of RAM, improving by 8% the Top1 accuracy with respect to
previously published 8 bit implementations for microcontrollers.Comment: Submitted to NeurIPS 201
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks
Deep neural networks have evolved remarkably over the past few years and they
are currently the fundamental tools of many intelligent systems. At the same
time, the computational complexity and resource consumption of these networks
also continue to increase. This will pose a significant challenge to the
deployment of such networks, especially in real-time applications or on
resource-limited devices. Thus, network acceleration has become a hot topic
within the deep learning community. As for hardware implementation of deep
neural networks, a batch of accelerators based on FPGA/ASIC have been proposed
in recent years. In this paper, we provide a comprehensive survey of recent
advances in network acceleration, compression and accelerator design from both
algorithm and hardware points of view. Specifically, we provide a thorough
analysis of each of the following topics: network pruning, low-rank
approximation, network quantization, teacher-student networks, compact network
design and hardware accelerators. Finally, we will introduce and discuss a few
possible future directions.Comment: 14 pages, 3 figure
Regularizing Activation Distribution for Training Binarized Deep Networks
Binarized Neural Networks (BNNs) can significantly reduce the inference
latency and energy consumption in resource-constrained devices due to their
pure-logical computation and fewer memory accesses. However, training BNNs is
difficult since the activation flow encounters degeneration, saturation, and
gradient mismatch problems. Prior work alleviates these issues by increasing
activation bits and adding floating-point scaling factors, thereby sacrificing
BNN's energy efficiency. In this paper, we propose to use distribution loss to
explicitly regularize the activation flow, and develop a framework to
systematically formulate the loss. Our experiments show that the distribution
loss can consistently improve the accuracy of BNNs without losing their energy
benefits. Moreover, equipped with the proposed regularization, BNN training is
shown to be robust to the selection of hyper-parameters including optimizer and
learning rate
- …