4,692 research outputs found
On Study of the Binarized Deep Neural Network for Image Classification
Recently, the deep neural network (derived from the artificial neural
network) has attracted many researchers' attention by its outstanding
performance. However, since this network requires high-performance GPUs and
large storage, it is very hard to use it on individual devices. In order to
improve the deep neural network, many trials have been made by refining the
network structure or training strategy. Unlike those trials, in this paper, we
focused on the basic propagation function of the artificial neural network and
proposed the binarized deep neural network. This network is a pure binary
system, in which all the values and calculations are binarized. As a result,
our network can save a lot of computational resource and storage. Therefore, it
is possible to use it on various devices. Moreover, the experimental results
proved the feasibility of the proposed network.Comment: 9 pages, 6 figures. Rejected conference (CVPR 2015) submission.
Submission date: November, 2014. This work is patented in China (NO.
201410647710.3
Binarized Convolutional Neural Networks for Efficient Inference on GPUs
Convolutional neural networks have recently achieved significant
breakthroughs in various image classification tasks. However, they are
computationally expensive,which can make their feasible mplementation on
embedded and low-power devices difficult. In this paper convolutional neural
network binarization is implemented on GPU-based platforms for real-time
inference on resource constrained devices. In binarized networks, all weights
and intermediate computations between layers are quantized to +1 and -1,
allowing multiplications and additions to be replaced with bit-wise operations
between 32-bit words. This representation completely eliminates the need for
floating point multiplications and additions and decreases both the
computational load and the memory footprint compared to a full-precision
network implemented in floating point, making it well-suited for
resource-constrained environments. We compare the performance of our
implementation with an equivalent floating point implementation on one desktop
and two embedded GPU platforms. Our implementation achieves a maximum speed up
of 7. 4X with only 4.4% loss in accuracy compared to a reference
implementation.Comment: IEEE EUSIPCO 201
Verifying Properties of Binarized Deep Neural Networks
Understanding properties of deep neural networks is an important challenge in
deep learning. In this paper, we take a step in this direction by proposing a
rigorous way of verifying properties of a popular class of neural networks,
Binarized Neural Networks, using the well-developed means of Boolean
satisfiability. Our main contribution is a construction that creates a
representation of a binarized neural network as a Boolean formula. Our encoding
is the first exact Boolean representation of a deep neural network. Using this
encoding, we leverage the power of modern SAT solvers along with a proposed
counterexample-guided search procedure to verify various properties of these
networks. A particular focus will be on the critical property of robustness to
adversarial perturbations. For this property, our experimental results
demonstrate that our approach scales to medium-size deep neural networks used
in image classification tasks. To the best of our knowledge, this is the first
work on verifying properties of deep neural networks using an exact Boolean
encoding of the network.Comment: 10 page
Minimizing Classification Energy of Binarized Neural Network Inference for Wearable Devices
In this paper, we propose a low-power hardware for efficient deployment of
binarized neural networks (BNNs) that have been trained for physiological
datasets. BNNs constrain weights and feature-map to 1 bit, can pack in as many
1-bit weights as the width of a memory entry provides, and can execute multiple
multiply-accumulate (MAC) operations with one fused bit-wise xnor and
population-count instruction over aligned packed entries. Our proposed hardware
is scalable with the number of processing engines (PEs) and the memory width,
both of which adjustable for the most energy efficient configuration given an
application. We implement two real case studies including Physical Activity
Monitoring and Stress Detection on our platform, and for each case study on the
target platform, we seek the optimal PE and memory configurations. Our
implementation results indicate that a configuration with a good choice of
memory width and number of PEs can be optimized up to 4x and 2.5x in energy
consumption respectively on Artix-7 FPGA and on 65nm CMOS ASIC implementation.
We also show that, generally, wider memories make more efficient BNN processing
hardware. To further reduce the energy, we introduce Pool-Skipping technique
that can skip at least 25% of the operations that are accompanied by a Max-Pool
layer in BNNs, leading to a total of 22% operation reduction in the Stress
Detection case study. Compared to the related works using the same case studies
on the same target platform and with the same classification accuracy, our
hardware is respectively 4.5x and 250x more energy efficient for the Stress
Detection on FPGA and Physical Activity Monitoring on ASIC, respectively.Comment: Accepted and presented at 20th International Symposium on Quality
Electronic Design (ISQED 2019) on March 7th, 2019, Santa Clara, CA, US
Adjustable Bounded Rectifiers: Towards Deep Binary Representations
Binary representation is desirable for its memory efficiency, computation
speed and robustness. In this paper, we propose adjustable bounded rectifiers
to learn binary representations for deep neural networks. While hard
constraining representations across layers to be binary makes training
unreasonably difficult, we softly encourage activations to diverge from real
values to binary by approximating step functions. Our final representation is
completely binary. We test our approach on MNIST, CIFAR10, and ILSVRC2012
dataset, and systematically study the training dynamics of the binarization
process. Our approach can binarize the last layer representation without loss
of performance and binarize all the layers with reasonably small degradations.
The memory space that it saves may allow more sophisticated models to be
deployed, thus compensating the loss. To the best of our knowledge, this is the
first work to report results on current deep network architectures using
complete binary middle representations. Given the learned representations, we
find that the firing or inhibition of a binary neuron is usually associated
with a meaningful interpretation across different classes. This suggests that
the semantic structure of a neural network may be manifested through a guided
binarization process.Comment: Under review as a conference paper at ICLR 201
Matrix and tensor decompositions for training binary neural networks
This paper is on improving the training of binary neural networks in which
both activations and weights are binary. While prior methods for neural network
binarization binarize each filter independently, we propose to instead
parametrize the weight tensor of each layer using matrix or tensor
decomposition. The binarization process is then performed using this latent
parametrization, via a quantization function (e.g. sign function) applied to
the reconstructed weights. A key feature of our method is that while the
reconstruction is binarized, the computation in the latent factorized space is
done in the real domain. This has several advantages: (i) the latent
factorization enforces a coupling of the filters before binarization, which
significantly improves the accuracy of the trained models. (ii) while at
training time, the binary weights of each convolutional layer are parametrized
using real-valued matrix or tensor decomposition, during inference we simply
use the reconstructed (binary) weights. As a result, our method does not
sacrifice any advantage of binary networks in terms of model compression and
speeding-up inference. As a further contribution, instead of computing the
binary weight scaling factors analytically, as in prior work, we propose to
learn them discriminatively via back-propagation. Finally, we show that our
approach significantly outperforms existing methods when tested on the
challenging tasks of (a) human pose estimation (more than 4% improvements) and
(b) ImageNet classification (up to 5% performance gains)
Improved training of binary networks for human pose estimation and image recognition
Big neural networks trained on large datasets have advanced the
state-of-the-art for a large variety of challenging problems, improving
performance by a large margin. However, under low memory and limited
computational power constraints, the accuracy on the same problems drops
considerable. In this paper, we propose a series of techniques that
significantly improve the accuracy of binarized neural networks (i.e networks
where both the features and the weights are binary). We evaluate the proposed
improvements on two diverse tasks: fine-grained recognition (human pose
estimation) and large-scale image recognition (ImageNet classification).
Specifically, we introduce a series of novel methodological changes including:
(a) more appropriate activation functions, (b) reverse-order initialization,
(c) progressive quantization, and (d) network stacking and show that these
additions improve existing state-of-the-art network binarization techniques,
significantly. Additionally, for the first time, we also investigate the extent
to which network binarization and knowledge distillation can be combined. When
tested on the challenging MPII dataset, our method shows a performance
improvement of more than 4% in absolute terms. Finally, we further validate our
findings by applying the proposed techniques for large-scale object recognition
on the Imagenet dataset, on which we report a reduction of error rate by 4%
Build a Compact Binary Neural Network through Bit-level Sensitivity and Data Pruning
Convolutional neural network (CNN) has been widely used for vision-based
tasks. Due to the high computational complexity and memory storage requirement,
it is hard to directly deploy a full-precision CNN on embedded devices. The
hardware-friendly designs are needed for re-source-limited and
energy-constrained embed-ded devices. Emerging solutions are adopted for the
neural network compression, e.g., bina-ry/ternary weight network, pruned
network and quantized network. Among them, Binarized Neural Network (BNN) is
believed to be the most hardware-friendly framework due to its small network
size and low computational com-plexity. No existing work has further shrunk the
size of BNN. In this work, we explore the redun-dancy in BNN and build a
compact BNN (CBNN) based on the bit-level sensitivity analy-sis and bit-level
data pruning. The input data is converted to a high dimensional bit-sliced
for-mat. In post-training stage, we analyze the im-pact of different bit slices
to the accuracy. By pruning the redundant input bit slices and shrinking the
network size, we are able to build a more compact BNN. Our result shows that we
can further scale down the network size of the BNN up to 3.9x with no more than
1% accuracy drop. The actual runtime can be reduced up to 2x and 9.9x compared
with the baseline BNN and its full-precision counterpart, respectively
The High-Dimensional Geometry of Binary Neural Networks
Recent research has shown that one can train a neural network with binary
weights and activations at train time by augmenting the weights with a
high-precision continuous latent variable that accumulates small changes from
stochastic gradient descent. However, there is a dearth of theoretical analysis
to explain why we can effectively capture the features in our data with binary
weights and activations. Our main result is that the neural networks with
binary weights and activations trained using the method of Courbariaux, Hubara
et al. (2016) work because of the high-dimensional geometry of binary vectors.
In particular, the ideal continuous vectors that extract out features in the
intermediate representations of these BNNs are well-approximated by binary
vectors in the sense that dot products are approximately preserved. Compared to
previous research that demonstrated the viability of such BNNs, our work
explains why these BNNs work in terms of the HD geometry. Our theory serves as
a foundation for understanding not only BNNs but a variety of methods that seek
to compress traditional neural networks. Furthermore, a better understanding of
multilayer binary neural networks serves as a starting point for generalizing
BNNs to other neural network architectures such as recurrent neural networks.Comment: 12 pages, 4 Figure
LP-3DCNN: Unveiling Local Phase in 3D Convolutional Neural Networks
Traditional 3D Convolutional Neural Networks (CNNs) are computationally
expensive, memory intensive, prone to overfit, and most importantly, there is a
need to improve their feature learning capabilities. To address these issues,
we propose Rectified Local Phase Volume (ReLPV) block, an efficient alternative
to the standard 3D convolutional layer. The ReLPV block extracts the phase in a
3D local neighborhood (e.g., 3x3x3) of each position of the input map to obtain
the feature maps. The phase is extracted by computing 3D Short Term Fourier
Transform (STFT) at multiple fixed low frequency points in the 3D local
neighborhood of each position. These feature maps at different frequency points
are then linearly combined after passing them through an activation function.
The ReLPV block provides significant parameter savings of at least, 3^3 to 13^3
times compared to the standard 3D convolutional layer with the filter sizes
3x3x3 to 13x13x13, respectively. We show that the feature learning capabilities
of the ReLPV block are significantly better than the standard 3D convolutional
layer. Furthermore, it produces consistently better results across different 3D
data representations. We achieve state-of-the-art accuracy on the volumetric
ModelNet10 and ModelNet40 datasets while utilizing only 11% parameters of the
current state-of-the-art. We also improve the state-of-the-art on the UCF-101
split-1 action recognition dataset by 5.68% (when trained from scratch) while
using only 15% of the parameters of the state-of-the-art. The project webpage
is available at https://sites.google.com/view/lp-3dcnn/home.Comment: Accepted in CVPR 201
- …