2,475 research outputs found
Binarized Neural Architecture Search for Efficient Object Recognition
Traditional neural architecture search (NAS) has a significant impact in
computer vision by automatically designing network architectures for various
tasks. In this paper, binarized neural architecture search (BNAS), with a
search space of binarized convolutions, is introduced to produce extremely
compressed models to reduce huge computational cost on embedded devices for
edge computing. The BNAS calculation is more challenging than NAS due to the
learning inefficiency caused by optimization requirements and the huge
architecture space, and the performance loss when handling the wild data in
various computing applications. To address these issues, we introduce operation
space reduction and channel sampling into BNAS to significantly reduce the cost
of searching. This is accomplished through a performance-based strategy that is
robust to wild data, which is further used to abandon less potential
operations. Furthermore, we introduce the Upper Confidence Bound (UCB) to solve
1-bit BNAS. Two optimization methods for binarized neural networks are used to
validate the effectiveness of our BNAS. Extensive experiments demonstrate that
the proposed BNAS achieves a comparable performance to NAS on both CIFAR and
ImageNet databases. An accuracy of vs. is achieved on the
CIFAR-10 dataset, but with a significantly compressed model, and a
faster search than the state-of-the-art PC-DARTS. On the wild face recognition
task, our binarized models achieve a performance similar to their corresponding
full-precision models.Comment: arXiv admin note: substantial text overlap with arXiv:1911.1086
Verifying Properties of Binarized Deep Neural Networks
Understanding properties of deep neural networks is an important challenge in
deep learning. In this paper, we take a step in this direction by proposing a
rigorous way of verifying properties of a popular class of neural networks,
Binarized Neural Networks, using the well-developed means of Boolean
satisfiability. Our main contribution is a construction that creates a
representation of a binarized neural network as a Boolean formula. Our encoding
is the first exact Boolean representation of a deep neural network. Using this
encoding, we leverage the power of modern SAT solvers along with a proposed
counterexample-guided search procedure to verify various properties of these
networks. A particular focus will be on the critical property of robustness to
adversarial perturbations. For this property, our experimental results
demonstrate that our approach scales to medium-size deep neural networks used
in image classification tasks. To the best of our knowledge, this is the first
work on verifying properties of deep neural networks using an exact Boolean
encoding of the network.Comment: 10 page
FINN-L: Library Extensions and Design Trade-off Analysis for Variable Precision LSTM Networks on FPGAs
It is well known that many types of artificial neural networks, including
recurrent networks, can achieve a high classification accuracy even with
low-precision weights and activations. The reduction in precision generally
yields much more efficient hardware implementations in regards to hardware
cost, memory requirements, energy, and achievable throughput. In this paper, we
present the first systematic exploration of this design space as a function of
precision for Bidirectional Long Short-Term Memory (BiLSTM) neural network.
Specifically, we include an in-depth investigation of precision vs. accuracy
using a fully hardware-aware training flow, where during training quantization
of all aspects of the network including weights, input, output and in-memory
cell activations are taken into consideration. In addition, hardware resource
cost, power consumption and throughput scalability are explored as a function
of precision for FPGA-based implementations of BiLSTM, and multiple approaches
of parallelizing the hardware. We provide the first open source HLS library
extension of FINN for parameterizable hardware architectures of LSTM layers on
FPGAs which offers full precision flexibility and allows for parameterizable
performance scaling offering different levels of parallelism within the
architecture. Based on this library, we present an FPGA-based accelerator for
BiLSTM neural network designed for optical character recognition, along with
numerous other experimental proof points for a Zynq UltraScale+ XCZU7EV MPSoC
within the given design space.Comment: Accepted for publication, 28th International Conference on Field
Programmable Logic and Applications (FPL), August, 2018, Dublin, Irelan
Adjustable Bounded Rectifiers: Towards Deep Binary Representations
Binary representation is desirable for its memory efficiency, computation
speed and robustness. In this paper, we propose adjustable bounded rectifiers
to learn binary representations for deep neural networks. While hard
constraining representations across layers to be binary makes training
unreasonably difficult, we softly encourage activations to diverge from real
values to binary by approximating step functions. Our final representation is
completely binary. We test our approach on MNIST, CIFAR10, and ILSVRC2012
dataset, and systematically study the training dynamics of the binarization
process. Our approach can binarize the last layer representation without loss
of performance and binarize all the layers with reasonably small degradations.
The memory space that it saves may allow more sophisticated models to be
deployed, thus compensating the loss. To the best of our knowledge, this is the
first work to report results on current deep network architectures using
complete binary middle representations. Given the learned representations, we
find that the firing or inhibition of a binary neuron is usually associated
with a meaningful interpretation across different classes. This suggests that
the semantic structure of a neural network may be manifested through a guided
binarization process.Comment: Under review as a conference paper at ICLR 201
Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
We introduce a method to train Binarized Neural Networks (BNNs) - neural
networks with binary weights and activations at run-time. At training-time the
binary weights and activations are used for computing the parameters gradients.
During the forward pass, BNNs drastically reduce memory size and accesses, and
replace most arithmetic operations with bit-wise operations, which is expected
to substantially improve power-efficiency. To validate the effectiveness of
BNNs we conduct two sets of experiments on the Torch7 and Theano frameworks. On
both, BNNs achieved nearly state-of-the-art results over the MNIST, CIFAR-10
and SVHN datasets. Last but not least, we wrote a binary matrix multiplication
GPU kernel with which it is possible to run our MNIST BNN 7 times faster than
with an unoptimized GPU kernel, without suffering any loss in classification
accuracy. The code for training and running our BNNs is available on-line.Comment: 11 pages and 3 figure
LP-3DCNN: Unveiling Local Phase in 3D Convolutional Neural Networks
Traditional 3D Convolutional Neural Networks (CNNs) are computationally
expensive, memory intensive, prone to overfit, and most importantly, there is a
need to improve their feature learning capabilities. To address these issues,
we propose Rectified Local Phase Volume (ReLPV) block, an efficient alternative
to the standard 3D convolutional layer. The ReLPV block extracts the phase in a
3D local neighborhood (e.g., 3x3x3) of each position of the input map to obtain
the feature maps. The phase is extracted by computing 3D Short Term Fourier
Transform (STFT) at multiple fixed low frequency points in the 3D local
neighborhood of each position. These feature maps at different frequency points
are then linearly combined after passing them through an activation function.
The ReLPV block provides significant parameter savings of at least, 3^3 to 13^3
times compared to the standard 3D convolutional layer with the filter sizes
3x3x3 to 13x13x13, respectively. We show that the feature learning capabilities
of the ReLPV block are significantly better than the standard 3D convolutional
layer. Furthermore, it produces consistently better results across different 3D
data representations. We achieve state-of-the-art accuracy on the volumetric
ModelNet10 and ModelNet40 datasets while utilizing only 11% parameters of the
current state-of-the-art. We also improve the state-of-the-art on the UCF-101
split-1 action recognition dataset by 5.68% (when trained from scratch) while
using only 15% of the parameters of the state-of-the-art. The project webpage
is available at https://sites.google.com/view/lp-3dcnn/home.Comment: Accepted in CVPR 201
Training Generative Adversarial Networks with Binary Neurons by End-to-end Backpropagation
We propose the BinaryGAN, a novel generative adversarial network (GAN) that
uses binary neurons at the output layer of the generator. We employ the
sigmoid-adjusted straight-through estimators to estimate the gradients for the
binary neurons and train the whole network by end-to-end backpropogation. The
proposed model is able to directly generate binary-valued predictions at test
time. We implement such a model to generate binarized MNIST digits and
experimentally compare the performance for different types of binary neurons,
GAN objectives and network architectures. Although the results are still
preliminary, we show that it is possible to train a GAN that has binary neurons
and that the use of gradient estimators can be a promising direction for
modeling discrete distributions with GANs. For reproducibility, the source code
is available at https://github.com/salu133445/binarygan
ReLU Code Space: A Basis for Rating Network Quality Besides Accuracy
We propose a new metric space of ReLU activation codes equipped with a
truncated Hamming distance which establishes an isometry between its elements
and polyhedral bodies in the input space which have recently been shown to be
strongly related to safety, robustness, and confidence. This isometry allows
the efficient computation of adjacency relations between the polyhedral bodies.
Experiments on MNIST and CIFAR-10 indicate that information besides accuracy
might be stored in the code space.Comment: in ICLR 2020 Workshop on Neural Architecture Search (NAS 2020
Learning Sparse & Ternary Neural Networks with Entropy-Constrained Trained Ternarization (EC2T)
Deep neural networks (DNN) have shown remarkable success in a variety of
machine learning applications. The capacity of these models (i.e., number of
parameters), endows them with expressive power and allows them to reach the
desired performance. In recent years, there is an increasing interest in
deploying DNNs to resource-constrained devices (i.e., mobile devices) with
limited energy, memory, and computational budget. To address this problem, we
propose Entropy-Constrained Trained Ternarization (EC2T), a general framework
to create sparse and ternary neural networks which are efficient in terms of
storage (e.g., at most two binary-masks and two full-precision values are
required to save a weight matrix) and computation (e.g., MAC operations are
reduced to a few accumulations plus two multiplications). This approach
consists of two steps. First, a super-network is created by scaling the
dimensions of a pre-trained model (i.e., its width and depth). Subsequently,
this super-network is simultaneously pruned (using an entropy constraint) and
quantized (that is, ternary values are assigned layer-wise) in a training
process, resulting in a sparse and ternary network representation. We validate
the proposed approach in CIFAR-10, CIFAR-100, and ImageNet datasets, showing
its effectiveness in image classification tasks.Comment: Proceedings of the CVPR'20 Joint Workshop on Efficient Deep Learning
in Computer Vision. Code is available at
https://github.com/d-becking/efficientCNN
A Neuromorphic Paradigm for Online Unsupervised Clustering
A computational paradigm based on neuroscientific concepts is proposed and
shown to be capable of online unsupervised clustering. Because it is an online
method, it is readily amenable to streaming realtime applications and is
capable of dynamically adjusting to macro-level input changes. All operations,
both training and inference, are localized and efficient. The paradigm is
implemented as a cognitive column that incorporates five key elements: 1)
temporal coding, 2) an excitatory neuron model for inference, 3)
winner-take-all inhibition, 4) a column architecture that combines excitation
and inhibition, 5) localized training via spike timing de-pendent plasticity
(STDP). These elements are described and discussed, and a prototype column is
given. The prototype column is simulated with a semi-synthetic benchmark and is
shown to have performance characteristics on par with classic k-means.
Simulations reveal the inner operation and capabilities of the column with
emphasis on excitatory neuron response functions and STDP implementations.Comment: Submitted to 53rd IEEE/ACM International Symposium on
Microarchitectur
- …