970 research outputs found
TernaryNet: Faster Deep Model Inference without GPUs for Medical 3D Segmentation using Sparse and Binary Convolutions
Deep convolutional neural networks (DCNN) are currently ubiquitous in medical
imaging. While their versatility and high quality results for common image
analysis tasks including segmentation, localisation and prediction is
astonishing, the large representational power comes at the cost of highly
demanding computational effort. This limits their practical applications for
image guided interventions and diagnostic (point-of-care) support using mobile
devices without graphics processing units (GPU). We propose a new scheme that
approximates both trainable weights and neural activations in deep networks by
ternary values and tackles the open question of backpropagation when dealing
with non-differentiable functions. Our solution enables the removal of the
expensive floating-point matrix multiplications throughout any convolutional
neural network and replaces them by energy and time preserving binary operators
and population counts. Our approach, which is demonstrated using a
fully-convolutional network (FCN) for CT pancreas segmentation leads to more
than 10-fold reduced memory requirements and we provide a concept for
sub-second inference without GPUs. Our ternary approximation obtains high
accuracies (without any post-processing) with a Dice overlap of 71.0% that are
statistically equivalent to using networks with high-precision weights and
activations. We further demonstrate the significant improvements reached in
comparison to binary quantisation and without our proposed ternary hyperbolic
tangent continuation. We present a key enabling technique for highly efficient
DCNN inference without GPUs that will help to bring the advances of deep
learning to practical clinical applications. It has also great promise for
improving accuracies in large-scale medical data retrieval
Recent Advances in Convolutional Neural Network Acceleration
In recent years, convolutional neural networks (CNNs) have shown great
performance in various fields such as image classification, pattern
recognition, and multi-media compression. Two of the feature properties, local
connectivity and weight sharing, can reduce the number of parameters and
increase processing speed during training and inference. However, as the
dimension of data becomes higher and the CNN architecture becomes more
complicated, the end-to-end approach or the combined manner of CNN is
computationally intensive, which becomes limitation to CNN's further
implementation. Therefore, it is necessary and urgent to implement CNN in a
faster way. In this paper, we first summarize the acceleration methods that
contribute to but not limited to CNN by reviewing a broad variety of research
papers. We propose a taxonomy in terms of three levels, i.e.~structure level,
algorithm level, and implementation level, for acceleration methods. We also
analyze the acceleration methods in terms of CNN architecture compression,
algorithm optimization, and hardware-based improvement. At last, we give a
discussion on different perspectives of these acceleration and optimization
methods within each level. The discussion shows that the methods in each level
still have large exploration space. By incorporating such a wide range of
disciplines, we expect to provide a comprehensive reference for researchers who
are interested in CNN acceleration.Comment: submitted to Neurocomputin
DNN Feature Map Compression using Learned Representation over GF(2)
In this paper, we introduce a method to compress intermediate feature maps of
deep neural networks (DNNs) to decrease memory storage and bandwidth
requirements during inference. Unlike previous works, the proposed method is
based on converting fixed-point activations into vectors over the smallest
GF(2) finite field followed by nonlinear dimensionality reduction (NDR) layers
embedded into a DNN. Such an end-to-end learned representation finds more
compact feature maps by exploiting quantization redundancies within the
fixed-point activations along the channel or spatial dimensions. We apply the
proposed network architectures derived from modified SqueezeNet and MobileNetV2
to the tasks of ImageNet classification and PASCAL VOC object detection.
Compared to prior approaches, the conducted experiments show a factor of 2
decrease in memory requirements with minor degradation in accuracy while adding
only bitwise computations.Comment: CEFRL201
Optimize Deep Convolutional Neural Network with Ternarized Weights and High Accuracy
Deep convolution neural network has achieved great success in many artificial
intelligence applications. However, its enormous model size and massive
computation cost have become the main obstacle for deployment of such powerful
algorithm in the low power and resource-limited embedded systems. As the
countermeasure to this problem, in this work, we propose statistical weight
scaling and residual expansion methods to reduce the bit-width of the whole
network weight parameters to ternary values (i.e. -1, 0, +1), with the
objectives to greatly reduce model size, computation cost and accuracy
degradation caused by the model compression. With about 16x model compression
rate, our ternarized ResNet-32/44/56 could outperform full-precision
counterparts by 0.12%, 0.24% and 0.18% on CIFAR- 10 dataset. We also test our
ternarization method with AlexNet and ResNet-18 on ImageNet dataset, which both
achieve the best top-1 accuracy compared to recent similar works, with the same
16x compression rate. If further incorporating our residual expansion method,
compared to the full-precision counterpart, our ternarized ResNet-18 even
improves the top-5 accuracy by 0.61% and merely degrades the top-1 accuracy
only by 0.42% for the ImageNet dataset, with 8x model compression rate. It
outperforms the recent ABC-Net by 1.03% in top-1 accuracy and 1.78% in top-5
accuracy, with around 1.25x higher compression rate and more than 6x
computation reduction due to the weight sparsity
BitSplit-Net: Multi-bit Deep Neural Network with Bitwise Activation Function
Significant computational cost and memory requirements for deep neural
networks (DNNs) make it difficult to utilize DNNs in resource-constrained
environments. Binary neural network (BNN), which uses binary weights and binary
activations, has been gaining interests for its hardware-friendly
characteristics and minimal resource requirement. However, BNN usually suffers
from accuracy degradation. In this paper, we introduce "BitSplit-Net", a neural
network which maintains the hardware-friendly characteristics of BNN while
improving accuracy by using multi-bit precision. In BitSplit-Net, each bit of
multi-bit activations propagates independently throughout the network before
being merged at the end of the network. Thus, each bit path of the BitSplit-Net
resembles BNN and hardware friendly features of BNN, such as bitwise binary
activation function, are preserved in our scheme. We demonstrate that the
BitSplit version of LeNet-5, VGG-9, AlexNet, and ResNet-18 can be trained to
have similar classification accuracy at a lower computational cost compared to
conventional multi-bit networks with low bit precision (<= 4-bit). We further
evaluate BitSplit-Net on GPU with custom CUDA kernel, showing that BitSplit-Net
can achieve better hardware performance in comparison to conventional multi-bit
networks
WRPN: Wide Reduced-Precision Networks
For computer vision applications, prior works have shown the efficacy of
reducing numeric precision of model parameters (network weights) in deep neural
networks. Activation maps, however, occupy a large memory footprint during both
the training and inference step when using mini-batches of inputs. One way to
reduce this large memory footprint is to reduce the precision of activations.
However, past works have shown that reducing the precision of activations hurts
model accuracy. We study schemes to train networks from scratch using
reduced-precision activations without hurting accuracy. We reduce the precision
of activation maps (along with model parameters) and increase the number of
filter maps in a layer, and find that this scheme matches or surpasses the
accuracy of the baseline full-precision network. As a result, one can
significantly improve the execution efficiency (e.g. reduce dynamic memory
footprint, memory bandwidth and computational energy) and speed up the training
and inference process with appropriate hardware support. We call our scheme
WRPN - wide reduced-precision networks. We report results and show that WRPN
scheme is better than previously reported accuracies on ILSVRC-12 dataset while
being computationally less expensive compared to previously reported
reduced-precision networks
FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
Convolutional Neural Networks have rapidly become the most successful machine
learning algorithm, enabling ubiquitous machine vision and intelligent
decisions on even embedded computing-systems. While the underlying arithmetic
is structurally simple, compute and memory requirements are challenging. One of
the promising opportunities is leveraging reduced-precision representations for
inputs, activations and model parameters. The resulting scalability in
performance, power efficiency and storage footprint provides interesting design
compromises in exchange for a small reduction in accuracy. FPGAs are ideal for
exploiting low-precision inference engines leveraging custom precisions to
achieve the required numerical accuracy for a given application. In this
article, we describe the second generation of the FINN framework, an end-to-end
tool which enables design space exploration and automates the creation of fully
customized inference engines on FPGAs. Given a neural network description, the
tool optimizes for given platforms, design targets and a specific precision. We
introduce formalizations of resource cost functions and performance
predictions, and elaborate on the optimization algorithms. Finally, we evaluate
a selection of reduced precision neural networks ranging from CIFAR-10
classifiers to YOLO-based object detection on a range of platforms including
PYNQ and AWS\,F1, demonstrating new unprecedented measured throughput at
50TOp/s on AWS-F1 and 5TOp/s on embedded devices.Comment: to be published in ACM TRETS Special Edition on Deep Learnin
A Survey of FPGA-Based Neural Network Accelerator
Recent researches on neural network have shown significant advantage in
machine learning over traditional algorithms based on handcrafted features and
models. Neural network is now widely adopted in regions like image, speech and
video recognition. But the high computation and storage complexity of neural
network inference poses great difficulty on its application. CPU platforms are
hard to offer enough computation capacity. GPU platforms are the first choice
for neural network process because of its high computation capacity and easy to
use development frameworks.
On the other hand, FPGA-based neural network inference accelerator is
becoming a research topic. With specifically designed hardware, FPGA is the
next possible solution to surpass GPU in speed and energy efficiency. Various
FPGA-based accelerator designs have been proposed with software and hardware
optimization techniques to achieve high speed and energy efficiency. In this
paper, we give an overview of previous work on neural network inference
accelerators based on FPGA and summarize the main techniques used. An
investigation from software to hardware, from circuit level to system level is
carried out to complete analysis of FPGA-based neural network inference
accelerator design and serves as a guide to future work
LUTNet: Rethinking Inference in FPGA Soft Logic
Research has shown that deep neural networks contain significant redundancy,
and that high classification accuracies can be achieved even when weights and
activations are quantised down to binary values. Network binarisation on FPGAs
greatly increases area efficiency by replacing resource-hungry multipliers with
lightweight XNOR gates. However, an FPGA's fundamental building block, the
K-LUT, is capable of implementing far more than an XNOR: it can perform any
K-input Boolean operation. Inspired by this observation, we propose LUTNet, an
end-to-end hardware-software framework for the construction of area-efficient
FPGA-based neural network accelerators using the native LUTs as inference
operators. We demonstrate that the exploitation of LUT flexibility allows for
far heavier pruning than possible in prior works, resulting in significant area
savings while achieving comparable accuracy. Against the state-of-the-art
binarised neural network implementation, we achieve twice the area efficiency
for several standard network models when inferencing popular datasets. We also
demonstrate that even greater energy efficiency improvements are obtainable.Comment: Accepted manuscript uploaded 01/04/19. DOA 03/03/1
The High-Dimensional Geometry of Binary Neural Networks
Recent research has shown that one can train a neural network with binary
weights and activations at train time by augmenting the weights with a
high-precision continuous latent variable that accumulates small changes from
stochastic gradient descent. However, there is a dearth of theoretical analysis
to explain why we can effectively capture the features in our data with binary
weights and activations. Our main result is that the neural networks with
binary weights and activations trained using the method of Courbariaux, Hubara
et al. (2016) work because of the high-dimensional geometry of binary vectors.
In particular, the ideal continuous vectors that extract out features in the
intermediate representations of these BNNs are well-approximated by binary
vectors in the sense that dot products are approximately preserved. Compared to
previous research that demonstrated the viability of such BNNs, our work
explains why these BNNs work in terms of the HD geometry. Our theory serves as
a foundation for understanding not only BNNs but a variety of methods that seek
to compress traditional neural networks. Furthermore, a better understanding of
multilayer binary neural networks serves as a starting point for generalizing
BNNs to other neural network architectures such as recurrent neural networks.Comment: 12 pages, 4 Figure
- …