113 research outputs found
Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL
Recent technological advances have proliferated the available computing
power, memory, and speed of modern Central Processing Units (CPUs), Graphics
Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs).
Consequently, the performance and complexity of Artificial Neural Networks
(ANNs) is burgeoning. While GPU accelerated Deep Neural Networks (DNNs)
currently offer state-of-the-art performance, they consume large amounts of
power. Training such networks on CPUs is inefficient, as data throughput and
parallel computation is limited. FPGAs are considered a suitable candidate for
performance critical, low power systems, e.g. the Internet of Things (IOT) edge
devices. Using the Xilinx SDAccel or Intel FPGA SDK for OpenCL development
environment, networks described using the high-level OpenCL framework can be
accelerated on heterogeneous platforms. Moreover, the resource utilization and
power consumption of DNNs can be further enhanced by utilizing regularization
techniques that binarize network weights. In this paper, we introduce, to the
best of our knowledge, the first FPGA-accelerated stochastically binarized DNN
implementations, and compare them to implementations accelerated using both
GPUs and FPGAs. Our developed networks are trained and benchmarked using the
popular MNIST and CIFAR-10 datasets, and achieve near state-of-the-art
performance, while offering a >16-fold improvement in power consumption,
compared to conventional GPU-accelerated networks. Both our FPGA-accelerated
determinsitic and stochastic BNNs reduce inference times on MNIST and CIFAR-10
by >9.89x and >9.91x, respectively.Comment: 4 pages, 3 figures, 1 tabl
XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference
Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to
conventional deep neural networks at a fraction of the cost in terms of memory
and energy. In this paper, we introduce the XNOR Neural Engine (XNE), a fully
digital configurable hardware accelerator IP for BNNs, integrated within a
microcontroller unit (MCU) equipped with an autonomous I/O subsystem and hybrid
SRAM / standard cell memory. The XNE is able to fully compute convolutional and
dense layers in autonomy or in cooperation with the core in the MCU to realize
more complex behaviors. We show post-synthesis results in 65nm and 22nm
technology for the XNE IP and post-layout results in 22nm for the full MCU
indicating that this system can drop the energy cost per binary operation to
21.6fJ per operation at 0.4V, and at the same time is flexible and performant
enough to execute state-of-the-art BNN topologies such as ResNet-34 in less
than 2.2mJ per frame at 8.9 fps.Comment: 11 pages, 8 figures, 2 tables, 3 listings. Accepted for presentation
at CODES'18 and for publication in IEEE Transactions on Computer-Aided Design
of Circuits and Systems (TCAD) as part of the ESWEEK-TCAD special issu
Structured Binary Neural Networks for Image Recognition
We propose methods to train convolutional neural networks (CNNs) with both
binarized weights and activations, leading to quantized models that are
specifically friendly to mobile devices with limited power capacity and
computation resources. Previous works on quantizing CNNs often seek to
approximate the floating-point information using a set of discrete values,
which we call value approximation, typically assuming the same architecture as
the full-precision networks. Here we take a novel "structure approximation"
view of quantization -- it is very likely that different architectures designed
for low-bit networks may be better for achieving good performance. In
particular, we propose a "network decomposition" strategy, termed Group-Net, in
which we divide the network into groups. Thus, each full-precision group can be
effectively reconstructed by aggregating a set of homogeneous binary branches.
In addition, we learn effective connections among groups to improve the
representation capability. Moreover, the proposed Group-Net shows strong
generalization to other tasks. For instance, we extend Group-Net for accurate
semantic segmentation by embedding rich context into the binary structure.
Furthermore, for the first time, we apply binary neural networks to object
detection. Experiments on both classification, semantic segmentation and object
detection tasks demonstrate the superior performance of the proposed methods
over various quantized networks in the literature. Our methods outperform the
previous best binary neural networks in terms of accuracy and computation
efficiency.Comment: 15 pages. Extended version of the conference version arXiv:1811.1041
- …